MATLAB
Basics
Multiple versions of MATLAB are available on Katana. Before using any version of MATLAB it is necessary to configure your environment with the module
command.
For example to see what versions of MATLAB are installed you should use the module avail
command as shown here
[z1234567@katana ~]$ module avail matlab  /share/apps/modules/matlab  matlab/2007b matlab/2010b matlab/2012a matlab/2013b matlab/2015a matlab/2016b matlab/2008b matlab/2011a matlab/2012b matlab/2014a matlab/2015b matlab/2009b matlab/2011b matlab/2013a matlab/2014b matlab/2016a
With so many different versions of MATLAB available you should always specify exactly which version of MATLAB you wish to use. For example to use MATLAB 2013a:
[z1234567@katana ~]$ module add matlab/2013a
Issuing that particular module
command will make the required changes to the $PATH
and $LD_LIBRARY_PATH
environment variables for the duration of your current terminal session. Please read the module
documentation for instructions on how to permanently apply these changes to your environment.
Running MATLAB as batch versus interactively
If you are currently running MATLAB interactively, that is by typing commands directly into MATLAB, then by saving the commands that you enter into a text file then you can create a MATLAB script which means that you can submit one or more jobs and have them run without the need for any interaction.
If you save your MATLAB script in a file called mymatlabscript
then the following PBS job script is an example of how to run a MATLAB script as a PBS batch job:
#!/bin/bash #PBS l nodes=1:ppn=1 #PBS l vmem=4gb #PBS l walltime=12:00:00 cd ${PBS_O_WORKDIR} matlab nodisplay r mymatlabscript
This script defines a job that requires one cpu and 4GB of memory with an expected runtime of 12 hours. By default a PBS job will start in the user's $HOME
directory, but the $PBS_O_WORKDIR
environment variable can be used to start the job in the directory from where it was submitted. No X session will be available to a PBS batch job, so the MATLAB nodisplay
option is essential.
If you use the r option, then MATLAB is expecting the name of a MATLAB command, which can of course be a MATLAB file in the current path, BUT must NOT include the .m. Thus matlab nodisplay r myjob.m
will NOT work as myjob.m is not a MATLAB command.
Implicit Multithreading
MATLAB support for multithreaded computation was first incorporated into MATLAB 2007a and is enabled by default in MATLAB 2008a and later. As mentioned in the MATLAB documentation, multithreading in MATLAB speeds up elementwise computations such as those done by the sin
and log
functions, and computations that use the Basic Linear Algebra Subroutines (BLAS) library, such as matrix multiply.
Consequently, if you are using any version of MATLAB since MATLAB 2008a, then it is quite likely that your scripts are already multithreaded. This has very important consequences for MATLAB jobs submitted to a cluster such as Katana. It would be very easy to assume your MATLAB script is not multithreaded and request a single cpu for your job, but if that job actually uses multiple threads then it will adversely affect other jobs on the cluster. Therefore, care must be taken to ensure that your MATLAB script only uses the amount of resources requested in your PBS job script.
In earlier versions of MATLAB it was possible to constrain the number of threads used by MATLAB with the maxNumCompThreads(N)
function. However, this is no longer effective in recent versions of MATLAB. Therefore, the only options are: (a) reserve a whole compute node (ncpus=X
where X
is the number of cores on the node that you want to use) and let MATLAB use all available cores; or (b) restrict MATLAB to one thread with the commandline flag singleCompThread
and reserve only one core in the job script (ncpus=1
).
Use Whole Compute Node  Use Single Core 
#!/bin/bash #PBS l nodes=1:ppn=8 #PBS l vmem=8gb #PBS l walltime=12:00:00 cd ${PBS_O_WORKDIR} matlab nodisplay r mymatlabscript 
#!/bin/bash #PBS l nodes=1:ppn=1 #PBS l vmem=1gb #PBS l walltime=12:00:00 cd ${PBS_O_WORKDIR} matlab singleCompThread nodisplay r mymatlabscript 
Resource Requirements
In recent versions of Linux (those that include glibc 2.11 or greater) the way that memory is allocated for multithreaded programs has changed. The net result is that multithreaded programs, such as MATLAB (with the JVM), will address a much larger pool of memory than they did previously. So starting MATLAB (with the JVM) will require at least 3Gb of memory before any calculations are performed.
There are several ways around this problem...

Revert to the old behaviour of memory allocation by issuing the following command before starting MATLAB which reserves less memory.
export MALLOC_ARENA_MAX=1

You can ask for more memory. For example if you only requested 2Gb of memory then you can request 4Gb instead by changing the value of vmem.
qsub I l nodes=1:ppn=1,vmem=4gb,walltime=1:00:00
 If your MATLAB code does not require it then you can disable the JVM and also limit yourself to a single core.
matlab nodisplay nojvm singleCompThread r main
If you request a single core (nodes=1:ppn=1) for a MATLAB job then you should use the MATLAB flag
singleCompThread
so that you do not oversubscribe your resources without realising it as MATLAB can perform implicit multithreading.
Parallel Computing Toolbox
The Parallel Computing Toolbox (PCT) offers support for parallel computing in MATLAB beyond that provided by implicit multithreading. With the PCT it is possible to create a pool of workers, managed by the matlabpool
function, which can be used to process a number of independent tasks in parallel. However, all of the workers in the pool must run on the same compute node. The number of MATLAB workers that you can create depends on the version of MATLAB that you are running. That information is in the table below or you can look at the MATLAB release notes for the version that you are interested in which is available at http://www.mathworks.com.au/help/distcomp/releasenotes.html.
Versions  Maximum Workers 

R2007aR2008b  4 
R2009aR2011a  8 
R2011bR2013b  12 
R2014a  Physical limit of node 
The most commonly used feature of the PCT for distributing tasks across a pool of workers is the Parallel forLoop. By simply replacing the for
keyword with the parfor
keyword a forloop can be converted to run in parallel:
Serial forLoop  Parallel forLoop 
In this example the parforLoop iterations are automatically distributed across the 4 workers created with the matlabpool
function. Since the iterations execute in parallel it is crucial that each iteration within the parforLoop is completely independent of all other iterations. It is also important that the associated PBS job script requests sufficient resources for the workers defined in the MATLAB script. A single resource chunk with ncpus
equal to the number of workers should be requested so that one CPU core is available to each worker and all CPU cores are allocated from the same compute node. The following PBS script is suitable for use with the previous parforLoop example:
#!/bin/bash #PBS l select=1:ncpus=4:mem=4gb #PBS l walltime=12:00:00 cd ${PBS_O_WORKDIR} matlab nodisplay r mymatlabscript
Note: If you start multiple MATLAB jobs which take advantage of parallel workers at the same time (for example via a script or as an array job) then there can be a problem as the parallel workers clash causing a parallel pool error. One way around this is to introduce a random delay between each MATLAB job to allow all of the workers time to finish starting before the next job starts by adding the following line in your script immediately before the line where MATLAB is launched.
sleep $[($RANDOM % 240) + 1]
More detail on parforLoops and other ways to use MATLAB workers can be found in the Parallel Computing Toolbox User Guide.