'UNSW - Science

UNSW - Science - HPC

Chaining Batch Jobs

If your data processing can be split into multiple steps then rather than creating one large batch job you may want to split it up into a number of smaller jobs. Some of the reasons that you may wish to do this are:

  • Your large job runs for over 200 hours.
  • Your job has multiple steps which use different amounts of resources at each step.

Chained Batch Job Examples

The simplest way to have a new job submitted as soon as the earlier job finishes is to all the following line

qsub newjob.pbs

to the end of your existing job script giving something like this:

#!/bin/bash
 
#PBS -N BASIC
#PBS -l nodes=1:ppn=1
#PBS -l vmem=1gb
#PBS -l walltime=1:00:00
#PBS -j oe
 
cd $PBS_O_WORKDIR
 
./my_prog
 
cd $HOME
 
qsub newjob.pbs

If you use this approach then you know that the output from the first job will be ready in time for your new job to run. There are 2 downsides to this approach

  1. The job does not enter the queue until the first job is completed.
  2. The next job will not run if the first job does not finish for some reason.

We can't do anything about the first issue without using array jobs but we can use the epilogue command to solve the second issue.

Array Jobs

If you have an array job that starts 500 jobs (about the most that you should submit at one time) then you can tell it to submit a new job when it is finishing the 450th job in the following way.

#!/bin/bash
 
#PBS -N ARRAY2
#PBS -l nodes=1:ppn=1
#PBS -l vmem=1gb
#PBS -l walltime=1:00:00
#PBS -j oe
 
#PBS -t 1-500
 
if [ $PBS_ARRAYID == "450"]
then
    qsub nextjobs.pbs
fi
 
cd $PBS_O_WORKDIR
 
./my_prog $PBS_ARRAYID

Unfortunately the example above requires us to have multiple job scripts ready to go. A better approach for many situations is to use information contained in our job script and pass it along to the next job script by using Environment Variables in the following way:

#!/bin/bash
 
#PBS -N ARRAY2
#PBS -l nodes=1:ppn=1
#PBS -l vmem=1gb
#PBS -l walltime=1:00:00
#PBS -j oe
 
#PBS -t 1-500
 
 
cd $PBS_O_WORKDIR
 
./my_prog $PBS_ARRAYID
 
if [ $PBS_ARRAYID == "450"]
then
    export MY_JOB_ID=$PBS_ARRAYID
    qsub nextjobs.pbs
fi

and then we can use

$MY_JOB_ID

in the job script nextjobs.pbs.

MAX_INDEX_VALUE=44000
 
export UPPER_END=$(($PBS_ARRAYID + 100))
 
if (($UPPER_END % 500 == 0))
then
    export NEXT_MAX=$(($UPPER_END + 500))
    export UPPER_END=$UPPER_END;         
    if [ "$UPPER_END" -lt "$MAX_INDEX_VALUE" ];
        then
            qsub -t ${UPPER_END}-${NEXT_MAX} $0
    fi
fi

Even better we can just use the job script alljob.pbs shown below

MAX_INDEX_VALUE=44000
 
export UPPER_END=$(($PBS_ARRAYID + 100))
 
if (($UPPER_END % 500 == 0))
then
    export NEXT_MAX=$(($UPPER_END + 500))
    export UPPER_END=$UPPER_END;         
    if [ "$UPPER_END" -lt "$MAX_INDEX_VALUE" ];
        then
        qsub -t ${UPPER_END}-${NEXT_MAX} $0
    fi
fi

and then start it all off using the command

qsub -t 1-500 alljobs.pbs

which will then start at the top and go from there.

Sometimes you have lots and lots (i.e. thousands) of little calculations and you want to both combine individual calculations into groups of 20 or so and also not overload the system by only submitting a few hundred jobs at a time. In the examples below we have 32,000 small calculations that we want to run. As we know that they finish fairly quickly we can run 20 calculations and still fit within the magic 12 hour limit so we will be submitting a total of 32,000 / 20 = 1,600 jobs to the system. As 1,600 divides nicely by 400 which is under the 500 limit that we like to use we will submit the jobs in clumps of 400 at a time. In both examples, the variable used for the index is $key.

Our first example calls a Python program with an argument of the index variable.

#!/bin/bash
 
#PBS -N job_script
#PBS -l nodes=1:ppn=1,vmem=4gb
#PBS -l walltime=12:00:00
#PBS -j oe
#PBS -M me@unsw.edu.au
#PBS -m ae
 
MAX_INDEX_VALUE=1600
 
module add python/2.7.9 
 
cd $HOME
 
((START_NUMBER = ($PBS_ARRAYID - 1 ) * 20 + 1))
((END_NUMBER = $START_NUMBER + 19))
 
for key in `(seq $START_NUMBER $END_NUMBER)`; 
    do my_python_script.py $key; 
done;
 
 
export UPPER_END=$(($PBS_ARRAYID + 100))
 
if (($UPPER_END %  400 == 0))
then
    export NEXT_MAX=$(($UPPER_END + 400))
    export UPPER_END=$UPPER_END;         
    if [ "$UPPER_END" -lt "$MAX_INDEX_VALUE" ] 
        then
        qsub -t ${UPPER_END}-${NEXT_MAX} $0
    fi
fi

The next example is of an R script that uses the variable $key internally.

#!/bin/bash
 
#PBS -N job_script
#PBS -l nodes=1:ppn=1,vmem=4gb
#PBS -l walltime=12:00:00
#PBS -j oe
#PBS -M me@unsw.edu.au
#PBS -m ae
 
MAX_INDEX_VALUE=1600
 
module add R/3.2.2
 
cd $HOME
 
((START_NUMBER = ($PBS_ARRAYID - 1 ) * 20 + 1))
((END_NUMBER = $START_NUMBER + 19))
 
for key in `(seq $START_NUMBER $END_NUMBER)`; 
    do Rscript $HOME/my_code.r;  
done;
 
 
export UPPER_END=$(($PBS_ARRAYID + 100))
 
if (($UPPER_END %  400 == 0))
then
    export NEXT_MAX=$(($UPPER_END + 400))
    export UPPER_END=$UPPER_END;         
    if [ "$UPPER_END" -lt "$MAX_INDEX_VALUE" ] 
         then
         qsub -t ${UPPER_END}-${NEXT_MAX} $0
    fi
fi