'UNSW - Science

UNSW - Science - HPC

Condor Overview

Request a Condor Account

Condor is a batch system for running jobs on a collection of computational resources. Like most batch systems it provides a queueing mechanism, scheduling policy, priority scheme and resource classifications. However, unlike most batch systems it is equally capable of managing a pool of dedicated machines (e.g. a compute cluster) or a pool of non-dedicated machines (e.g. a group of desktop PCs). Thanks to the latter case, Condor has enabled us to make good use of idle desktop machines within the School of Mathematics and Statistics.

It is important to recognise that the owner of each desktop machine within the Condor pool has ultimate control of when their machine is made available to Condor, if at all. So in order for a job to run on a particular desktop machine, that machine must be idle and any constraints imposed by the owner must also be satisfied. For example, one such owner might make their machine available to Condor at weekends and between 8pm and 8am on weekdays. Another owner might be more generous and make their machine available to Condor whenever the machine has been idle for 15 minutes or more. 

Given that Condor will only run jobs on idle machines, action must be taken by Condor to stop a running job when the machine is no longer idle (e.g. the owner has resumed work after lunch). As soon as Condor detects any activity (keyboard, mouse or process) by the owner of the desktop, then the Condor job is suspended. If the machine returns to an idle state within a certain time then the job resumes execution. Otherwise, Condor must migrate the job elsewhere. How the job migrates from one machine to another depends upon the job's universe.