'UNSW - Science

UNSW - Science - HPC

MATLAB

Basics

Multiple versions of MATLAB are available on Katana. Before using any version of MATLAB it is necessary to configure your environment with the module command.

For example to see what versions of MATLAB are installed you should use the module avail command as shown here

[z1234567@katana ~]$ module avail matlab
-------------------------------- /share/apps/modules/matlab --------------------------------
matlab/2007b   matlab/2010b    matlab/2012a    matlab/2013b     matlab/2015a    matlab/2016b
matlab/2008b   matlab/2011a    matlab/2012b    matlab/2014a     matlab/2015b
matlab/2009b   matlab/2011b    matlab/2013a    matlab/2014b     matlab/2016a

With so many different versions of MATLAB available you should always specify exactly which version of MATLAB you wish to use. For example to use MATLAB 2013a:

[z1234567@katana ~]$ module add matlab/2013a

Issuing that particular module command will make the required changes to the $PATH and $LD_LIBRARY_PATH environment variables for the duration of your current terminal session. Please read the module documentation for instructions on how to permanently apply these changes to your environment.

Running MATLAB as batch versus interactively

If you are currently running MATLAB interactively, that is by typing commands directly into MATLAB, then by saving the commands that you enter into a text file then you can create a MATLAB script which means that you can submit one or more jobs and have them run without the need for any interaction.

If you save your MATLAB script in a file called mymatlabscript then the following PBS job script is an example of how to run a MATLAB script as a PBS batch job:

#!/bin/bash
 
#PBS -l nodes=1:ppn=1
#PBS -l vmem=4gb
#PBS -l walltime=12:00:00
 
cd ${PBS_O_WORKDIR}
 
matlab -nodisplay -r mymatlabscript

This script defines a job that requires one cpu and 4GB of memory with an expected runtime of 12 hours. By default a PBS job will start in the user's $HOME directory, but the $PBS_O_WORKDIR environment variable can be used to start the job in the directory from where it was submitted. No X session will be available to a PBS batch job, so the MATLAB -nodisplay option is essential.

If you use the -r option, then MATLAB is expecting the name of a MATLAB command, which can of course be a MATLAB file in the current path, BUT must NOT include the .m. Thus matlab -nodisplay -r myjob.m will NOT work as myjob.m is not a MATLAB command.

Implicit Multithreading

MATLAB support for multithreaded computation was first incorporated into MATLAB 2007a and is enabled by default in MATLAB 2008a and later. As mentioned in the MATLAB documentation, multithreading in MATLAB speeds up elementwise computations such as those done by the sin and log functions, and computations that use the Basic Linear Algebra Subroutines (BLAS) library, such as matrix multiply.

Consequently, if you are using any version of MATLAB since MATLAB 2008a, then it is quite likely that your scripts are already multithreaded. This has very important consequences for MATLAB jobs submitted to a cluster such as Katana. It would be very easy to assume your MATLAB script is not multithreaded and request a single cpu for your job, but if that job actually uses multiple threads then it will adversely affect other jobs on the cluster. Therefore, care must be taken to ensure that your MATLAB script only uses the amount of resources requested in your PBS job script.

In earlier versions of MATLAB it was possible to constrain the number of threads used by MATLAB with the maxNumCompThreads(N) function. However, this is no longer effective in recent versions of MATLAB. Therefore, the only options are: (a) reserve a whole compute node (ncpus=X where X is the number of cores on the node that you want to use) and let MATLAB use all available cores; or (b) restrict MATLAB to one thread with the command-line flag -singleCompThread and reserve only one core in the job script (ncpus=1).

Use Whole Compute Node Use Single Core
#!/bin/bash
 
#PBS -l nodes=1:ppn=8
#PBS -l vmem=8gb
#PBS -l walltime=12:00:00
 
cd ${PBS_O_WORKDIR}
 
matlab -nodisplay -r mymatlabscript
#!/bin/bash
 
#PBS -l nodes=1:ppn=1
#PBS -l vmem=1gb
#PBS -l walltime=12:00:00
 
cd ${PBS_O_WORKDIR}
 
matlab -singleCompThread -nodisplay -r mymatlabscript

Resource Requirements

In recent versions of Linux (those that include glibc 2.11 or greater) the way that memory is allocated for multithreaded programs has changed. The net result is that multithreaded programs, such as MATLAB (with the JVM), will address a much larger pool of memory than they did previously. So starting MATLAB (with the JVM) will require at least 3Gb of memory before any calculations are performed.

There are several ways around this problem...

  1. Revert to the old behaviour of memory allocation by issuing the following command before starting MATLAB which reserves less memory.
    export MALLOC_ARENA_MAX=1
  2. You can ask for more memory. For example if you only requested 2Gb of memory then you can request 4Gb instead by changing the value of vmem.
    qsub -I -l nodes=1:ppn=1,vmem=4gb,walltime=1:00:00
  3. If your MATLAB code does not require it then you can disable the JVM and also limit yourself to a single core.
    matlab -nodisplay -nojvm -singleCompThread -r main

If you request a single core (nodes=1:ppn=1) for a MATLAB job then you should use the MATLAB flag

-singleCompThread

so that you do not oversubscribe your resources without realising it as MATLAB can perform implicit multithreading.

Parallel Computing Toolbox

The Parallel Computing Toolbox (PCT) offers support for parallel computing in MATLAB beyond that provided by implicit multithreading. With the PCT it is possible to create a pool of workers, managed by the matlabpool function, which can be used to process a number of independent tasks in parallel. However, all of the workers in the pool must run on the same compute node. The number of MATLAB workers that you can create depends on the version of MATLAB that you are running. That information is in the table below or you can look at the MATLAB release notes for the version that you are interested in which is available at http://www.mathworks.com.au/help/distcomp/release-notes.html.

Versions Maximum Workers
R2007a-R2008b 4
R2009a-R2011a 8
R2011b-R2013b 12
R2014a- Physical limit of node

 

The most commonly used feature of the PCT for distributing tasks across a pool of workers is the Parallel for-Loop. By simply replacing the for keyword with the parfor keyword a for-loop can be converted to run in parallel:

Serial for-Loop Parallel for-Loop
 
 
for i = 1:1024
    A(i) = sin(i*2*pi/1024);
end
 
 
matlabpool open local 4
 
parfor i = 1:1024
    A(i) = sin(i*2*pi/1024);
end
 
matlabpool close

In this example the parfor-Loop iterations are automatically distributed across the 4 workers created with the matlabpool function. Since the iterations execute in parallel it is crucial that each iteration within the parfor-Loop is completely independent of all other iterations. It is also important that the associated PBS job script requests sufficient resources for the workers defined in the MATLAB script. A single resource chunk with ncpus equal to the number of workers should be requested so that one CPU core is available to each worker and all CPU cores are allocated from the same compute node. The following PBS script is suitable for use with the previous parfor-Loop example:

#!/bin/bash
 
#PBS -l select=1:ncpus=4:mem=4gb
#PBS -l walltime=12:00:00
 
cd ${PBS_O_WORKDIR}
 
matlab -nodisplay -r mymatlabscript

Note: If you start multiple MATLAB jobs which take advantage of parallel workers at the same time (for example via a script or as an array job) then there can be a problem as the parallel workers clash causing a parallel pool error. One way around this is to introduce a random delay between each MATLAB job to allow all of the workers time to finish starting before the next job starts by adding the following line in your script immediately before the line where MATLAB is launched.

sleep $[($RANDOM % 240) + 1]

More detail on parfor-Loops and other ways to use MATLAB workers can be found in the Parallel Computing Toolbox User Guide.

Matlab PBS Examples

Whilst you are working on making sure that your MATLAB code works on the cluster, the module command is being loaded correctly and you are referring to the right location for files you may want to use the following line to request an interactive terminal for 2 hours with 8Gb of memory.

z1234567@kc01b02:~$ qsub -I -l nodes=1:ppn=1,vmem=8gb,walltime=2:00:00

and once you have a terminal (and you know which version of MATLAB you want to run having used the module avail command you can run the following module add command to load the MATLAB module

z1234567@kc01b02:~$ module add matlab/2104b

Matlab Array Job Examples

When you want to submit multiple jobs that are the same except for different input data then Array Jobs are the way to go.

The following example will run 20 copies of Matlab with the environment variable PBS_ARRAYID being set to the values 1 through 20.

#!/bin/bash
 
#PBS -N ARRAY4
#PBS -l nodes=1:ppn=1
#PBS -l vmem=8gb
#PBS -l walltime=10:00:00
#PBS -j oe
 
#PBS -t 1-20
 
module add matlab/2015b
 
matlab –nodisplay -nojvm -singleCompThread -r myprog

We can then use the following Matlab code in a file called myprog and use the variable PBS_ARRAYID as a variable inside Matlab.

x = str2num(getenv(‘PBS_ARRAYID’))
fname=sprintf('x%d.txt',x)
save(fname,’x’)
quit()

Currently Installed MATLAB Versions

Software Name Version
MATLAB 2007b MATLAB 2007b
MATLAB 2008b MATLAB 2008b
MATLAB 2009b MATLAB 2009b
MATLAB 2010b MATLAB 2010b
MATLAB 2011a MATLAB 2011a
MATLAB 2011b MATLAB 2011b
MATLAB 2012a MATLAB 2012a
MATLAB 2012b MATLAB 2012b
MATLAB 2013a MATLAB 2013a
MATLAB 2013b MATLAB 2013b
MATLAB 2014a MATLAB 2014a
MATLAB 2014b MATLAB 2014b
MATLAB 2015a MATLAB 2015a
MATLAB 2015b MATLAB 2015b
MATLAB 2016a MATLAB 2016a
MATLAB 2016b MATLAB 2016b
MCR 2014b MCR 2014b