De Wiki de Calcul Québec
Aller à : Navigation, rechercher
Cette page est une traduction de la page R et la traduction est complétée à 100 % et à jour.

Autres langues :anglais 100% • ‎français 100%



"R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files."

Even though R was not developed for high performance computing (HPC), its popularity with scientists from a variety of disciplines, including engineering, mathematics, statistics, bioinformatics, etc. makes it an essential tool on HPC installations dedicated to academic research. Features such as C extensions, byte-compiled code and parallelisation allow for reasonable performance in single-node jobs. Thanks to R’s modular nature, users can customize the R functions available to them by installing packages from the Comprehensive R Archive Network (CRAN) into their home directories.

The R interpreter

With R in your environment, you can start the R interpreter, and type R code inside that environment:

[nom@serveur $] R
R version 2.14.2 (2012-02-29)
Copyright (C) 2012 The R Foundation for Statistical Computing
ISBN 3-900051-07-0
Platform: x86_64-unknown-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> values <- c(3,5,7,9)
> values[0]
[1] 3
> q()

To execute R scripts, use the Rscript front-end with the file containing the R commands as an argument:

[name@server $] Rscript computation.R

This front-end will automatically pass scripting-appropriate options --slave and --no-restore to the R interpreter. These also imply the --no-save option, preventing the creation of useless workspace files on exit.

Parallel jobs

The commands previously described can be used inside jobs submitted to the scheduler. Here is an example of a minimal, single-node, R submit file for a parallel computation, combining the commands seen up to now, and also redirecting standard output and error to files:

File : r-job-parallel.sh
#PBS -N r-job-name
#PBS -A xxx-yyy-zz
#PBS -l nodes=1:ppn=8
#PBS -l walltime=300
module load compilers/intel/12.0.4 blas-libs/mkl/10.3.4 java/jdk1.6.0 apps/r/2.14.2
Rscript computation.R > computation.out 2> computation.err

Since this submit file will launch one instance of R on one 8-CPU Colosse node, the R script computation.R should be able to perform the bulk of its tasks with multi-threaded, parallel functions that will take advantage of the available computing power. If this is not the case for your R script, you should consider using task arrays or GNU parallel.

There is also a multitude of packages available on CRAN that can be used to distribute R computation. Most of these packages are listed on CRAN Task View: High-Performance and Parallel Computing with R.

Installing R packages

To install packages from CRAN, you can use the install.packages facility inside the R interpreter. For example, to install the sp package that provides classes and methods for spatial data, use the following command on a login node:

[name@server $] R
> install.packages("sp")

When asked, select an appropriate mirror for download. Ideally, it will be geographically close to you.

Some packages require defining the environment variable TMPDIR before installing.

To install a package that you downloaded (i.e. not from CRAN), you can install it the following way. Assuming the package is named archive_package.tgz, run the following command in a shell:

[name@server $] R CMD INSTALL archive_package.tgz

Outils personnels
Espaces de noms

Ressources de Calcul Québec