First Steps to Running a Job
To make it easier to follow the steps all links in this section will open in a new tab. Following these instructions should give you a basic grounding in the different elements required to run a computational job on a cluster.
Request and activate your account
- Determine which system(s) you wish to use. A summary is available but if your area has bought into Katana then that is probably your best option.
- Request an account on the cluster or system that you would like to use by emailing the UNSW IT Service Centre (ITServiceCentre@unsw.edu.au).
- Log on to the cluster to confirm that your account works. Unless you are using an application that only works as a graphical application (like some Java based software installed on Katana) you should use a ssh client such as PuTTY to access Katana. You should also remember that a cluster is not like a normal computer. When you log on to the cluster you are logging on to a special node called the head node. The head node is there so that you can see what software is installed, transfer files around and submit jobs to the compute nodes. You should not use it to to run computations, instead you should use the qsub command which is described below.
- You can use the following commands at the command prompt.
- ls - list all files and directories.
- cd fred - will move you into a directory called fred.
- rm fred - remove a file called fred.
- network - connect your H-drive
- network -u - disconnect your H-drive
- quota - discover how much space you have used and how much space you have available
- module avail - list the software installed on the cluster
Note: If you want to use more commands then the web site linuxcommand.org is a great place to start learning more about the commands that you can use in a Linux terminal.
Set up the environment for your jobs
- Use the command
to list the versions of software available on the cluster. If the software you need is not installed on the cluster then you should email the IT Service Centre (ITServiceCentre@unsw.edu.au) detailing the software that you would like installed.
- Learn more about the module(s) that you are interested in by using the module help command. For example type
module help repeatmasker/4.0.5
module help antismash/2.0.2
to find out where the application is installed, what other modules it may depend on and where to get more information on the application.
- Load the module(s) that you want to use with the module add command.
- Confirm that your required modules are loaded using the command module list.
- Now that you have seen how the module command works you can set it up so that the modules that you use are automatically loaded for you when a job is started. To do this type
to edit your startup file and add the module add command(s) that you used earlier, for example,
module add matlab/2014b
to the end of the file. (We have to do this because the loading of modules is non-persistant and is forgotten as soon as you log out). This is probably the best approach to start with but you will probably want to mode the module commands to your job scripts (explained below) once you start to use Katana more extensively.
- You should become familiar with the use of themodule command to list and load software on the cluster. You should also check the software pages, module help and software FAQ to see if there is anything that specifically relates to the software that you will be using.
- Transfer any files that you need to the cluster such as data files and command files. There are several different ways of moving files to and from the cluster and more information is available here. To begin with a summary of the 2 simplest methods of moving data to the cluster are:
- Whilst logged in to the cluster, use the network command to mount your UNSW Home Drive (H Drive) which will then appear as hdrive on the cluster. You can now copy your scripts to your cluster home drive and any other data you want to scratch. You will need to copy the data over as the H Drive is not available on the individual compute nodes. To copy the files you will need to use the cp or rsync command as mentioned above.
- Either download and install WinSCP or the more powerful FileZilla which has all of the functionality of WinSCP as well as a number of additional features such as remote file editing and advanced folder synchronization and then use sftp to connect to katana.science.unsw.edu.au.
Using an interactive session to get everything working
- You will need to prepare an application to be run by the job script. If you have been using Matlab interactively, for example, then you should save the Matlab commands in a file that will be opened by Matlab on the cluster. You may want to initially work on your local computer, export the commands to a file and then copy them over to Katana to simplify the development process.
- If you are going to be modifying the command file regularly then you may want to consider using version control to ensure that a copy of every version of your commands is kept. The Faculty Computing Unit has both Subversion (SVN) and Git servers available for you to use.
- Now open an interactive session using the command
and then confirm that your application works and that your code runs. The command that you used to run your code is the command that you will need to enter into the job script. This will give you a terminal with a session with 1 core and 1 Gb of memory that will last for 1 hour. If you require more memory or time (which you almost certainly will) then you can change the resource requirements based on the available settings page so that it becomes something like
qsub -I -l nodes=1:ppn=1,vmem=8gb,walltime=4:00:00
which will give you a terminal for 4 hours with 8Gb of memory. You should only run the interactive job long enough to confirm that everything is running how it should and then move on to creating a job script for a non-interactive job.
- Whilst you have your interactive session open you may want to check that the command script that you prepared earlier works properly. This will require you to discover what command line options you need to for your script.
Creating a job script
- Once you know that your commands will work it is time to create a batch script. The pages that will help you create your job script are:
- Submit your job and monitor it using the various tools.
- Wait to receive an email telling you that your job is completed. Whilst you are waiting you may want to have a look at the Katana node list to see what nodes to have access to for long jobs (i.e. those taking over 12 hours) and the scheduling policies which tells you how many jobs you can have running at any time.