'UNSW - Science

UNSW - Science - HPC

Bioscience Software

There is an extensive range of bioscience software installed on Katana. You can use the module command to list all of the software that is installed and if there is something that you need that is not currently installed then you can request it by sending and email to ITServiceCentre@unsw.edu.au mentioning Katana and the software that you would like installed. Below are notes, information and sample job scripts for a few of the bioscience related applications that have been installed.

Databases

A number of databases from the NCBI that can be used with BLAST+ (and other tools) have been downloaded from http://www.ncbi.nlm.nih.gov/Ftp into /share/bio/blast so that you do not need to download them for yourself. If you would like the databases updated or new databases added please email the IT Service Centre.

BioConductor

BioConductor (www.bioconductor.org) is a collection of R packages which all relate to the BioSciences. This is a standard installation of R with a specific selection of packages installed.

BioPerl

BioPerl is not an application as such but is instead a package containing an extensive collection of bioscience functions available within Perl. The BioPerl functions can be accessed by loading any of the Perl or BioPerl modules. If you find that a Perl module that you need is not installed please email the IT Service Centre. If you are developing your own Perl application you should call perl using '#! /usr/bin/env perl' to ensure that the code uses the version of Perl loaded by the module command.

BioPython

BioPython is also a package containing an extensive collection of bioscience functions which is available in Python. BioPython can be accessed by loading any of the Python or BioPython modules as the BioPython module is just an alias for Python. If you find that a Python module that you need is not installed please email the IT Service Centre. If you are developing your own Python application you should call python using '#! /usr/bin/env python' to ensure that the code uses the version of python loaded by the module command.

Blast+ (http://blast.ncbi.nlm.nih.gov)

BLAST+ is a suite of BLAST like tools from the NCBI (National Center for Biotechnology Information) that utilizes the NCBI C++ Toolkit. The BLAST+ applications have a number of performance and feature improvements over the legacy BLAST applications and it is used by a number of applications that have been installed on Katana as well as being able to be used as a stand alone application.

By default BLAST+ will run using a single thread and you will get no benefit out of requesting more than 1 core. If you wish to use BLAST+ with more than 1 thread (i.e. running on more than 1 core at the same time) then you will need to add the flag '-num_threads X' where X is the number of threads that you wish to use. 

NOTE: Due to the way that BLAST+ manages the different compute processes if you want to use X threads then you will need to request X+1 cores in your job script.

The user manual is available at http://www.ncbi.nlm.nih.gov/books/NBK1762 and release notes are available at http://www.ncbi.nlm.nih.gov/books/NBK131777. Further information about Blast+ is available online at http://blast.ncbi.nlm.nih.gov.

Mothur (www.mothur.org)

If you wish to run Mothur on Katana then for each version there are 2 options available. 

If you use the command mothur then by default it will use a single core on a single node. You can also use multiple cores within a single node to to speed up computation by specifying the number of cores using the 'processors=n' option, where n is the number of cores that you have requested for your job, with each mothur command that supports that option.

If you want to run your Mothur job across multiple compute nodes then you can use the command mothur-mpi in conjunction with mpirun or mpiexec (part of the OpenMPI package).

NOTE: Mothur can have high memory use when more than 1 core is being used.