Frequently Asked Questions about Katana

Katana is a blade based cluster which is available for use by members of groups who have bought in to it. The extensive information under HPC Basics are of this site combined with the Katana specific information is a good starting point for making use of Katana. The answers to some commonly asked questions about Katana is included below. 

Does Katana run a 32 bit or a 64 bit operating system?

Katana runs a 64 bit version of the Centos distribution of Linux.

How much memory is available per core and/or per node on Katana?

The amount of memory available varies across the cluster. To determine how much memory each node has available use the 'pestat' command.

How much memory can I use on the head node on Katana for compiling software?

The head node has a total of 24GB of memory. Each individual user is limited to 6GB and should only be used to compile software.

Why isn't my job making it onto a node on Katana even though it says that some nodes are free?

There are three main reasons for you to see this behavior. The first of them is specific to Katana and the other two apply to any cluster.

Firstly, the compute nodes in Katana belong to various schools and research groups across the Faculty of Science (and beyond). Any job with an expected run-time longer than 12 hours can only run on a compute node that is somehow associated with the owner of the job. For example, if you are in the CCRC you are entitled to run 12+ hour jobs on the Science nodes and the nodes jointly purchased by CCRC. However, you cannot run 12+ hour jobs on the nodes purchased by Astrobiology, Statistics, TARS, CEPAR or Physics. So you may see idle nodes, but you may not be entitled to run a 12+ hour job on them.

Secondly, the idle nodes may not have sufficient resources for your job. For example, there may not be sufficient cpu cores or memory available on a single compute node.

Thirdly, there may be distributed memory jobs ahead of your job in the queue which have reservations on the idle nodes, and they are just waiting for all of their requested resources to become available. In this case, your job can only use the reserved nodes if your job can finish before the nodes are required by the distributed memory job.

How many jobs can I submit at the one time?

Technically you can submit as many jobs as you wish as the scheduling system takes into account the purchaser of the available nodes, the current load on the system, the requirements of your jobs and your usage of the cluster to determine which jobs get assigned to a node as space becomes available. In short, if you have submitted a large number of jobs you should expect that someone could come along afterwards and submit jobs that start to run ahead of some of your queued jobs.

Whilst there is not a technical limit to the number of jobs you can submit, submitting more that 2,000 jobs at the one time can place an unacceptable load on the job scheduler and your jobs may be deleted without warning.

How many cores of Katana can I use at once over all of my jobs?

The Job Scheduling and Queues page has information about the maximum number of cores that you can use at the one time.

What is the maximum number of CPUs I can use in parallel on Katana?

If you are regularly wanting to run large parallel jobs on Katana you should consider speaking to the Faculty of Science HPC team so that they are aware of your jobs. They may be able to provide you additional assistance on resource usage for parallel jobs.

Why does my SSH connection to Katana periodically dsconnect?

With all networks there is a limit to how long a connection between two computers will stay open if no data is travelling between them. More information about how to have the connection remain open is available on the cluster access page.

I was running a program and I suspect it failed due to the limit on the maximum array size. Is there an array size limit in Katana?

Is the array statically allocated? If you are using the default memory model then such arrays will be limited to 2GB. You will need the medium memory model to go beyond that limit. See the -mcmodel compiler option.

If you use the -M and -m flags to request an email when your job finishes then Included in that email will be a summary of the resources used by the job.