Array Jobs
One common use of computational clusters is for parametric sweeps. This involves running many instances of the same application but each with different input data. Manually creating and managing large numbers of such jobs would be quite tedious. However, Torque supports the concept of array jobs which greatly simplifies the process.
An array job is a single job script that spawns many almost identical sub-jobs. The only difference between the sub-jobs is an environment variable PBS_ARRAYID
whose value uniquely identifies an individual sub-job. A regular job becomes an array job when it uses the -t
flag to express the required range of values for PBS_ARRAYID
. For example, the following script will spawn 100 sub-jobs. Each sub-job will require one cpu core, 1GB memory and 1 hour run-time, and it will execute the same application. However, a different input file will be passed to the application within each sub-job. The first sub-job will read input data from a file called 1.dat
, the second sub-job will read input data from a file called 2.dat
and so on.
#!/bin/bash #PBS -l nodes=1:ppn=1,vmem=1gb #PBS -l walltime=1:00:00 #PBS -j oe #PBS -t 1-100 cd ${PBS_O_WORKDIR} ./myprogram ${PBS_ARRAYID}.dat
If you have 2 independent parameters that you want to cycle through then there is a number of different ways to do it. The simplest way is to create an array job and then use the BASH
command line to submit multiple array jobs. For example if you have data files red_1, ..., red_12, green_1, ..., green_12, blue_1, ..., blue_12, yellow_1, ..., yellow_12
for MY_VAR in red green blue yellow; do export $MY_VAR; qsub array.pbs; done;
where the following file is called array.pbs
. To make the variable MY_VAR
usable within the job script we have added the line
#PBS -v MY_VAR
to the start of the job script below.
#!/bin/bash #PBS -N ARRAY4 - $MY_VAR #PBS -l nodes=1:ppn=1 #PBS -l vmem=1gb #PBS -l walltime=1:00:00 #PBS -j oe #PBS -v MY_VAR #PBS -t 1-12 cd $HOME ./my_prog ${MY_VAR}_${PBS_ARRAYID}
Note: If you use an array job to start more than one copy of a program then, depending on the application, you may run into problems as multiple nearly identical jobs start all at once. If this occurs you can simply add a random wait in your script by adding the following line in your script immediately before the line where the application is launched.
sleep $((RANDOM % 240))
There are some more examples of array jobs including how to group your computations in an array job on the examples page.