Job Scheduling and Queues

It is the responsibility of the job scheduler to determine when and where jobs will be run. The rules that influence these decisions are defined by the job scheduling policy. Such a policy will consider factors including: the time a job has spent waiting in a queue, the resources required by the job, the usage history of the job owner, and so on. The job scheduling policy on Katana attempts to accommodate the various needs of its users and ensure all those who invest in Katana receive a fair share of its compute resources. Some parts of the job scheduling policy apply equally to all users, but some groups also define local rules which apply only to their users.

One of the most important features of the Katana job scheduling policy is that its influence over a job ends once the job has been dispatched to a compute node. When a job starts running on a compute node it will only be interrupted if it exceeds its resource requirements. Jobs will not be preempted to make way for a later job with higher priority. Instead, the scheduling policy offers incentives for users to submit jobs with relatively short run-times. The reduced run-times give the scheduler greater opportunity to maintain fair usage without resorting to preemption.

The primary incentive for users to submit short jobs is that jobs with an expected run-time of 12 hours or less can run on any compute node across the whole cluster. Whereas jobs that request more than 12 hours can only run on compute nodes purchased by the user's research group, school or faculty. With more compute nodes available to a 12 hour job there is an increased chance of finding a compute node with the required combination of resources, and consequently, the job will spend less time in the queue.

There are also scheduling rules that further constrain long jobs. These rules are defined by the owners of the compute nodes. On compute nodes purchased by ASB, CCRC and the School of Mathematics and Statistics, the following limits apply:

  • The maximum run-time of any job is 200 hours.
  • There is a maximum of 64 CPU cores per group to be shared by all jobs with run-time greater than 100 hours.
  • Each user is restricted to 72 CPU cores for all jobs with run-time between 48 and 100 hours.
  • Each user is restricted to 96 CPU cores for all jobs with run-time between 12 and 48 hours.

Currently, all groups on Katana impose the 200 hour run-time limit on their users, with the exception of Astrobiology who define the maximum run-time as 400 hours.

There is also an upper limit on CPU and memory usage that applies to all jobs belonging to an individual user regardless of run-time. This limit is based on the number of CPU cores purchased by the user's research group or school plus 5% of the total CPU cores in the cluster. Consequently, this limit increases over time as more compute nodes are purchased and installed.

Once you have reached any of the limits listed above (including the limit on cores based on the number of cores purchases) then the scheduler will give any new job a status of blocked until your blocked job combined with all of your active or idle jobs is below the limit.