Matlab

De Wiki de Calcul Québec
Aller à : Navigation, rechercher
Cette page contient des modifications qui ne sont pas marquées pour la traduction.

Autres langues :anglais 100% • ‎français 87%

Sommaire

Licensing Information

In general, Calcul Québec servers do not provide site-wide licenses for Matlab. However, there may be installations which are licensed for specific individuals, or users from specific groups or institutions. The Matlab MDCS license on Guillimin may be used by any Guillimin user provided they have a valid Matlab license for their desktop computer.

Submitting Matlab Jobs

Submitting Single-core Jobs

All computationally intensive Matlab jobs must be submitted to the queuing system. Do not use the login nodes for any memory or cpu intensive tasks. Before submitting your job make sure that the path to Matlab distribution is added to your path in your .bashrc file.

The following batch file can be used to submit your Matlab job:

File : singlecore.pbs
#!/bin/bash
#PBS -l nodes=1:ppn=1
#PBS -V
#PBS -N matlab_job
 
cd your_matlab_project_directory
 
matlab -nodisplay < your_matlab_code.m > output


After that, you submit your batch file to the queuing system (see job submissions and the specific documentation to each site)

Submitting Multicore Jobs

You can use Matlab Parallel Toolbox, and do parallel computations with Matlab within one computing node. Parallel Toolbox allows you to use the number of workers up to the number of cores in the node, but not more than 12 (or the total number of cores on a reserved node). You use this feature in the same way you do it on your desktop, except that the job must be packed with submission script and sent to the scheduler for execution. The submission of this type of jobs is similar to that of OpenMP parallel jobs. Here is a submission script template:

File : multicore.pbs
#!/bin/bash
#PBS -l nodes=1:ppn=12
#PBS -V
#PBS -N parallel_matlab_job
 
cd your_matlab_project_directory
 
matlab -nodisplay < your_matlab_code.m > output


In this script you are reserving 12 cores on one node, so you can open maximum 12 workers in your Matlab code. Do not forget to change "ppn" property according to the number of workers you are opening in your code. Also, do not forget that you can NOT reserve more than 1 node for your Matlab job, even parallelized with Parallel Toolbox.

General advice

A few hints about using Matlab on any big cluster:

  • Use cluster only for really big computational tasks. If your fit (or any other task) takes ~15min, why not just to do it on your desktop or laptop.
  • Remember that whatever you submit to the computing node, becomes "sealed" there. That means no online plotting is possible. Please, make necessary changes in your script. What people usually do, is to do heavy calculations on the cluster, and then later plot the resulting data on the workstation. If you absolutely need to plot something during calculation, it should be saved to a file without any GUI output.
  • Matlab job is considered by cluster scheduler as any other serial or threaded job. You can use only a single node with number of cores from 1 to 12 (limited by Parallel Toolbox and number of cores each node has). Therefore, if you do not have previous experience, please read carefully the Running jobs page.
  • Matlab jobs fall into the category of either serial or OpenMP jobs, depending on whether you use parallel toolbox or not.

Additional server-specific documentation

Compiling Matlab Scripts

Introduction

While Briarée does not have a Matlab license, it does have the (freely distributed) Matlab Compiler Runtime, a set of libraries that allows users to run Matlab scripts that have been compiled into a binary using the mcc program that comes with Matlab. This compilation must be performed on a computer with Matlab and running the same operating system, i.e. a Windows installation of Matlab will create a Windows binary which naturally will not run on Briarée. You must compile your Matlab script using a Matlab installation running under Linux. Note that the only Matlab compiler runtimes installed on Briarée at this time (7.17, 8.0 and 8.3) correspond to the R2012a (7.17), R2012b (8.0) and R2014a (8.3) versions of Matlab and naturally the Matlab compiler runtime must match the version of the Matlab compiler used to build the binary.

Compiling Your Script

The Matlab compiler mcc should normally be in the same directory as the Matlab binary itself and you will need to execute it from the command line, i.e. using a terminal interface. The compiler has a help page which can be access by the command mcc -help and which lists all the options that can be used with the compiler. For running a compiled Matlab script on Briarée, you need to choose the "standalone program" option, which is -m. When you run the command it will generate two output files: the first of these is the actual binary file (akin to an .exe file in Windows) while the second is a wrapper script, with a filename ending in .sh. When you run your Matlab program on Briarée, you will call this script which will configure your environment and then call the binary file that has been created. The command below should work for a workstation managed by Concordia's Faculty of Engineering and Computer Science, where we compile a script called foo.m using the R2014a version of Matlab.

[name@server $] /encs/pkg/matlab-R2014a/root/bin/mcc -m foo.m


Running the Binary on Briarée

You will need to copy the two files (e.g. foo and run_foo.sh) from your workstation to your account on Briarée using one of the standard techniques like scp or sftp. If your Matlab script also depends on various data files in order to run correctly, then you will also need to copy these to Briarée as well of course and to arrange their directory structure in a manner identical to that on your workstation. If you have a lot of data in your files or you foresee doing a lot of I/O, then all of your files should go in $SCRATCH on Briarée. Once this has been accomplished, you can finally submit a Matlab job to the scheduler. A sample script is included below, which assumes you are using the R2014a version of Matlab and that you have placed the files in the directory $SCRATCH/matlab_jobs.

Job Submission Script for Briarée

File : submit_matlab_briaree.pbs
#!/bin/bash
#PBS -N NameOfTheJob
#PBS -A abc-123-aa
#PBS -l nodes=1:ppn=12
#PBS -l walltime=12:00:00
 
cd $SCRATCH/matlab_jobs
 
module load Matlab-compiler-runtimes/8.3
./run_foo.sh /home/apps/Logiciels/MATLAB/MATLAB_Compiler_Runtime/v8.3/v83/

MDCS Documentation

Introduction

IMPORTANT: the license has expired on June 30, 2017, and will not be renewed. Any job submitted to Guillimin with the below procedure will fail.

Jobs using the Matlab Distributed Computing Server (MDCS) on Guillimin require 2 Matlab licenses and 2 Matlab installations: The MDCS installation/license on Guillimin (provided by us), and a licensed installation of Matlab with the Parallel Computing Toolbox on the user's desktop computer (provided by you or your institution). Matlab installations activated through McGill's Software Centre include the parallel computing toolbox. If you have installed Matlab from another source, please ensure that you have the required prerequisites. MDCS allows you to use any toolbox licenses provided they are also licensed on your desktop installation of Matlab.

IMPORTANT: Parallel Matlab on Guillimin works with versions 2012a, 2012b, 2013a, 2013b, 2014a, 2014b, 2015aSP1, and 2015b. Other version are not supported. Please, upgrade your local installation if necessary.

IMPORTANT: Please, do not try to directly launch the MDCS Matlab binaries on Guillimin - it will give you a license error. The MDCS matlab installation on Guillimin is not a standard Matlab installation, like the one on your desktop. Its role is only to "transfer" the job from your desktop/laptop, and to start it on multiple nodes of the cluster.

Setting up your parallel Matlab environment

The MDCS system accepts batch job submissions from a user's desktop computer to be scheduled and run through the Guillimin scheduler. Before submitting a job, the Matlab installed on the user's computer must be configured for submission to Guillimin.

Please, follow these steps to correctly setup your parallel Matlab environment:

  1. Download our configuration archive, and unpack it on your local machine ($ tar -xvf guillimin_mdcs_config.tar.gz). It contains "config" and "examples" folders. Put the "examples" somewhere in your project directory on your local machine - you will use these simple examples as test runs and as a reference.
  2. The "config" folder contains all necessary configuration files. First, copy all "config/toolbox-local/*" files to the "<your_matlab_install>/toolbox/local" folder on your local machine.
  3. Create a profile for the Guillimin cluster in Matlab
    • Restart Matlab
    • At the Matlab command prompt, run: glmnConfigCluster
      • Warning: This command will delete any previous cluster profiles named guillimin
    • Matlab will prompt you for some information about your computer. Please be careful as inputting incorrect information can make your profile unuseable. If you make an error, your cluster profile can be reset by re-running glmnConfigCluster. Please enter:
      • A unique identifier for your local computer (for example, the hostname, or a description like 'lab7' or 'laptop'). Do not use spaces.
      • Your home directory on your local computer (for example, /home/alex on Linux, /Users/alex on Mac, or C:\Users\alex on Windows)
      • Your home directory on Guillimin (for example, /home/alex). You may also specify a folder in your project space or any folder that you have write access to.
      • You may be asked to specify the Matlab path on Guillimin if our script hasn't been configured for your Matlab release. The versions that we support are as follows:
        • 2012a: /software/applications/matlab-2012a-para
        • 2012b: /software/applications/matlab-2012b-para
        • 2013a: /software/applications/matlab-2013a-para
        • 2013b: /software/CentOS-6/applications/matlab-2013b-para
        • 2014a: /software/CentOS-6/applications/matlab-2014a-para
        • 2014b: /software/CentOS-6/applications/matlab-2014b-para
        • 2015aSP1: /software/CentOS-6/applications/matlab-2015a-para
        • 2015b: /software/CentOS-6/applications/matlab-2015b-para
    • You should now have a cluster profile called 'guillimin' in your Matlab 'manage cluster profiles' menu.
    • Log in to Guillimin using ssh and create your Matlab job folder (your details will be different from this example)
    mkdir -p /home/username/.matlab/jobs/myLaptop/guillimin/R2014b
  4. This is the end of configuration procedure. We advise to restart Matlab at this point.

Validation

Important: Note that you must have a valid glmnPBS.m file in your working directory during validation. The examples/TestParfor/glmnPBS.m is an example of a valid glmnPBS.m file. For more information, please read the next section.

If you use the automatic validation of the Guillimin cluster profile, please expect the final test (MATLAB pool test) to fail. This functionality is not supported on our system. We recommend performing a manual validation for distributed and parallel batch jobs instead of the automatic validation. Please send us the validation output if you experience any problems with the validation tests.

  • Distributed job
setSchedulerMessageHandler(@disp)
cluster = parcluster('guillimin');
cluster.NumWorkers = 3;
job = createJob (cluster);
createTask(job, @sum, 1, {[1 1]});
submit(job);
wait(job);
out = fetchOutputs(job)
  • Parallel job
setSchedulerMessageHandler(@disp)
cluster = parcluster('guillimin');
cluster.NumWorkers = 3;
job = createCommunicatingJob(cluster, 'Type', 'spmd') ;
createTask(job, @labindex, 1, {});
submit(job);
wait(job);
out = fetchOutputs(job)

Submitting your parallel Matlab jobs

The jobs are submitted from within the Matlab session on your local machine using the glmnPBS.submitTo(cluster) command after configuring the glmnPBS.m file.

IMPORTANT: You must always have "glmnPBS.m" file in the same directory as the script you are submitting for execution (see examples). In this file you manually set the mandatory submission parameters for your job. The following is an example of the properties definitions at the top of the glmnPBS.m file.

classdef glmnPBS
    %Guillimin PBS submission arguments
    properties
        % Local script, remote working directory (home, by default)
        localScript = 'TestParfor';
        workingDirectory = '.';

        % nodes, ppn, gpus, phis and other attributes
        numberOfNodes = 1;
        procsPerNode = 6;
        gpus = 0;
        phis = 0;
        attributes = ;

        % Specify the memory per process required
        pmem = '1700m'

        % Requested walltime
        walltime = '00:30:00'

        % Please use metaq unless you require a specific node type
        queue = 'metaq'

        % All jobs should specify an account or RAPid:
        % e.g.
        % account = 'xyz-123-aa'
        account = ;

        % You may use otherOptions to append a string to the qsub command
        % e.g.
        % otherOptions = '-M email[at]address.com -m bae'
        otherOptions = 
    end

    % ...
end

Before submitting your job, it is recommended that you review your current submission profile using the getSubmitArgs() function:

test = glmnPBS();
test.getSubmitArgs()
  • Also, be aware that in the MDCS model the master process is not used for matlabpool procedures (parfor, spmd). Therefore, N cores are effectively reserved for the job on Guillimin, but only N-1 Matlab workers will execute the parallel code. In the previous example, there would be 5 workers in the matlabpool.

After reviewing your submission arguments, you may submit your job:

cluster = parcluster('guillimin');
glmnPBS.submitTo(cluster);

The glmnPBS.submitTo(cluster) function will submit the localScript script to Guillimin inside a job with properties defined in glmnPBS.m. This function is simply a wrapper for Matlab's batch() function (you may wish to customize it yourself).

    methods(Static)
        function job = submitTo(cluster)
            opt = glmnPBS();
            job = batch(cluster,    opt.localScript,     ...
                'matlabpool',       opt.getNbWorkers(),  ...
                'CurrentDirectory', opt.workingDirectory ...
                );
        end
    end

During the submission process you will be asked for the username for Guillimin cluster. There will also be a pop-up window asking if you want to use "identity file" for cluster connection. You may select "Yes" and select a valid OpenSSH private key file, or you may select "No" and you will be asked for your password for Guillimin.

You can check the status of your submitted job from Matlab GUI: Parallel --> Monitor Jobs. However, logging in to Guillimin and using our job management commands is a more direct way of checking.

HINT: You can not monitor the progress of your calculations from Matlab GUI, as all processes on remote nodes are "sealed" there, and no STDOUT from workers is possible. The only way to follow the progress of your computations is to make periodic outputs to a text file with fprintf statement (see examples). Then you can login directly to Guillimin and check the content of that file.

IMPORTANT: It is possible to transfer data files and additional script files (or whole directories) from your local machine to the cluster during job submission. It is done via special options to batch command (see the TestSMPD example and batch help). However, please do not do that in case of large data files. Instead, please copy your data to Guillimin separately before calculations. In the same way, please save large outputs on Guillimin filesystem, and then transfer them to your local machine.
Outils personnels
Espaces de noms

Variantes
Actions
Navigation
Ressources de Calcul Québec
Outils
Partager