HPC Jobs

The head node of a cluster is a shared resource for all users and is used for preparing, submitting and managing jobs. Never run any computationally intensive processes on the head node. Jobs are submitted from the head node, but they actually run on one or more of the compute nodes. The procedure by which jobs are allocated to compute nodes and managed during their lifetime is the responsibility of the resource manager and the job scheduler. Different clusters use different tools to manage resources and schedule jobs. For example Katana uses a resource manager known as Torque (based on an older one called PBS) and a job scheduler called Maui.

Jobs are submitted using the qsub command. There are two types of job that qsub will accept: interactive jobs and batch jobs. An interactive job provides a login session on a compute node. This enables you to interact directly with the compute node by issuing any sequence of commands within the login session. Consequently, interactive jobs are useful for experimentation and debugging. In contrast, a batch job is a scripted job that runs from start to finish without any user intervention. The vast majority of jobs on the cluster are batch jobs. This type of job is appropriate for production runs of several hours or days.

When you wish to submit a batch job using qsub you will need to create a job script which specifies the resources that your job requires and calls your program. The general structure of a job script is shown below.