'UNSW - Science

UNSW - Science - HPC

R

The R programming language and software environment for statistical computing is a free and open source environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. More information about R is available from the website at http://www.r-project.org.

There are multiple versions of R including versions from both v2 and v3 installed on the HPC clusters. To find out the versions of R installed on the cluster you should use the command:

module avail R

which is like the standard module avail command but only shows the versions of R available.

As R is frequently updated the version that you wish to use may not be currently installed on the cluster. If this is the case (or if any other software is missing) you can either email the IT Service Centre (ITServiceCentre@unsw.edu.au) asking for the software to be installed or install the R package for yourself using the instructions below.

Adding functionality to R with R Packages from CRAN

One of the things that makes R so powerful is the collection of additional modules that provide extra functions. These modules can be downloaded from http://cran.r-project.org/mirrors.html.

What R modules are loaded?

To see what R packages are currently loaded you should use the search() command. For example:

> search()
 [1] ".GlobalEnv"        "package:abind"     "package:stats"    
 [4] "package:graphics"  "package:grDevices" "package:utils"    
 [7] "package:datasets"  "package:methods"   "Autoloads"        
[10] "package:base"     
> 

What R modules are available to be loaded?

After looking at the currently loaded packages you can see what packages are currently available by using the library() command which will give you something like the following response.

> library()
Packages in library ‘/share/apps/R/3.1.1/lib64/R/site-library’:
 
akima                   Interpolation of irregularly spaced data
alr3                    Data to accompany Applied Linear Regression 3rd
                        edition
car                     Companion to Applied Regression
clim.pact               Climate analysis and empirical-statistical
                        downscaling (ESD) package for monthly and daily
                        data.
coda                    Output analysis and diagnostics for MCMC
DEoptimR                Differential Evolution Optimization in pure R
 
> 

How do I load a R module?

You can load any of these packages using the library() command. For example to load the CAR R module type

> library('car')

and it will now turn up when you use the search() command.

How do I install a R module if it isn't on the list when I run library() command?

The process for adding a R module depends on the complexity of the module that you want to use. Once you have determined which R packages you want to use you can install them for yourself by starting R and then typing the command:

install.packages('package_name')

If you have not created a personal library for R prior to running this command then you will be prompted to create one. As an example to download and install the R package abind start R and then type:

install.packages('abind')

When you type a install.packages() for the first time then it will prompt you for an installation location as shown below.

> install.packages('abind')
Installing package into ‘/share/apps/R/3.1.1/lib64/R/site-library’
(as ‘lib’ is unspecified)
Warning in install.packages("abind") :
  'lib = "/share/apps/R/3.1.1/lib64/R/site-library"' is not writable
Would you like to use a personal library instead?  (y/n) y
Would you like to create a personal library
~/R/x86_64-unknown-linux-gnu-library/3.1
to install packages into?  (y/n) y
trying URL 'http://cran.ms.unimelb.edu.au/src/contrib/abind_1.4-3.tar.gz'
Content type 'application/x-gzip' length 21408 bytes (20 Kb)
opened URL
==================================================
downloaded 20 Kb
 
* installing *source* package ‘abind’ ...
** package ‘abind’ successfully unpacked and MD5 sums checked
** R
** inst
** preparing package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (abind)
 
The downloaded source packages are in/var/tmp/RtmpqzfO6Y/downloaded_packages’
> 

Once you have run the install.packages command the package is then available for you to see using the search() command and load using the library() command.

Note: When you install a package using the install.packages() command it will install it for the current "major revision", that is 2.x or 3.x. For example if you have installed a package for R 3.1.0 it will also work with R 3.1.1 but it will NOT work with R 3.0.1.

Running R jobs on multiple nodes using Rmpi

If you are running R version 3.0.1 or later on Katana then you can use Rmpi to utilize the power of the Message Passing Interface (MPI) run R jobs across multiple nodes. Rmpi provides an interface (wrapper) to MPI APIs, in our case OpenMPI version 1.6.5 as well as providing an interactive R slave environment. This allows your the compute load to be shared between the different available resources.

To use Rmpi just load R and OpenMPI version 1.6.5 via the module command and then when you are in R load the Rmpi package.

There is an excellent tutorial for Rmpi at Acardia University in Canada.

Useful R resources for High Performance Computing

There are numerous online resources for information on R. A couple of resources specifically for using R in the HPC environment are http://en.wikibooks.org/wiki/R_Programming/Profiling_R_code for information on how to profile your R code and http://en.wikibooks.org/wiki/R_Programming/Parallel_computing_with_R for information on how to wrote parallel R code.

R Versions Available via Module

Software Name Version
R 2.14.2 R 2.14.2
R 2.15.3 R 2.15.3
R 3.0.1 R 3.0.1
R 3.1.0 R 3.1.0
R 3.1.1 R 3.1.1
R 3.1.1-bioconductor-3.0 R 3.1.1-bioconductor-3.0
R 3.2.1 R 3.2.1
R 3.2.2 R 3.2.2
R 3.3.1 R 3.3.1
R 3.3.2-bioconductor-3.5 R 3.3.2-bioconductor-3.5