Whether submitting an interactive job or a batch job, the first step is to declare the job's resource requirements. These resources typically include: a number of CPU cores; an amount of memory; and a length of time. Based on this information, the job scheduler will determine when and where the job will run. During the time allocated to the job, the physical resources such as CPU cores and memory will be dedicated to the job for its sole use. If use of memory or time exceeds the amount requested then the job will be killed by the resource manager. Alternatively, if the requested resources far exceed the actual required resources then the job will run, but it will be unnecessarily delayed while waiting for all of the resources to become available. Consequently, it is very important to specify resource requirements as accurately as possible.
The following flags can be used to specify the resource requirements for your job. The
nodes flag has a number of different options that can be specified. Further information is given below.
|nodes||Number of compute nodes that you want the job to run on. Default is 1 node. Other options appear below.|
|vmem||The total amount of memory for the whole job. Default is 1GB.|
|pvmem||As an alternative to vmem, use pvmem to express the required memory per process.|
|walltime||The expected run time for the job in the form HH:MM:SS. Default is 1:00:00 (1 hour).|
|ppn||Number of CPU cores per compute node requested. Default is 1 core per node.|
|E7540||Run only on nodes with a Intel E7540 CPU.|
|X5660||Run only on nodes with a Intel X5660 CPU.|
|X5675||Run only on nodes with a Intel X5675 CPU.|
|E52670||Run only on nodes with a Intel E5-2670 CPU.|
|E52650V2||Run only on nodes with a Intel E5-2650v2 CPU.|
|X56||Run only on nodes with a Intel X56xx CPU. i.e. X5660 or X5675.|
|E5||Run only on nodes with a Intel E5-xxxx CPU. i.e. E5-2670 or E5-2650v2|
Compute Nodes, CPU Cores and Processor Selection
The number of CPU cores required by a job is specified with
X represents the number of compute nodes and
Y represents the number of CPU cores per node. For example, a single-threaded job requires
nodes=1:ppn=1, a multi-threaded job may require
nodes=1:ppn=12, and a distributed memory job may require
You can also specify the required type of processor by adding a colon followed by a processor code from the table above. For example, adding
:X56 to a node statement will restrict the job to nodes with an Intel X5660 or X5675 CPU:
#PBS -l nodes=2:X56:ppn=12 #PBS -l vmem=96gb #PBS -l walltime=12:00:00
Note: If you want to specify the CPU then you should look at the Katana node list to see what nodes you have access to. If you request more than 12 hours of WALLTIME then you can only use the nodes bought by your school or research group, or the Faculty of Science. A long running job that specifies a CPU that you don't have access to will never start.
The amount of memory required by a job is specified with
X can be expressed in terms of megabytes or gigabytes. For example, a job with relatively modest memory requirements might use
Note: If you are not part of the School of Mathematics and Statistics, UNSW Business School or the Climate Change Research Centre then a job requiring more than 128Gb of memory will only run if it has a WALLTIME of 12 hours or less based on node ownership.
The expected run-time of a job is specified with
S are hours, minutes and seconds. This is the time required by the job as measured by a wall clock. In other words, it is not time spent executing instructions in the CPU, and it is independent of the number of CPUs requested. Note that long jobs are more constrained than short jobs. See the Job Scheduling section for further details. You will also find that the run time will vary depending on which compute node it is run on. If you read the Katana node list then you can get information including benchmarks on the CPU in the nodes that you are using.
Note: Unless you have a specific reason to choose a different number the walltime that you request should be one of 12:00:00, 48:00:00, 100:00:00 or 200:00:00. These numbers represent the longest time that jobs can run in each of the queues.
Note: If possible break up your jobs so that they need less than 12 hours of run time. That way they will likely start sooner as they can run on any node.
Note: If you request a WALLTIME of greater than 200 hours then your job WILL NOT run unless you are a member of the Astrobiology group.
All of these resource requirements are passed to the
qsub command with the
-l (lowercase L) flag. You may use the
-l flag multiple times, once for each type of resource requirement, or you can follow a single
-l with a comma-separated list of all your resource requirements. If resource requirements for CPU cores, memory or time are not explicitly provided then default values will be used:
Finally, please note that a job requesting more resources than are physically available will not necessarily be rejected. For example, a job that requests 64 CPU cores in a single compute node (
nodes=1:ppn=64) will not run because there is no node with that many CPU cores, but the job will remain in the queue until such a node magically appears. The
pestat command provides a useful summary of the physical resources available in the cluster and the
checkjob command can be helpful when determining why a job is stuck in the queue.
If you are running a MATLAB job then you should look at the MATLAB page for MATLAB specific resource requirement information.