Moab

De Wiki de Calcul Québec
Aller à : Navigation, rechercher
Cette page est une traduction de la page Moab et la traduction est complétée à 100 % et à jour.

Autres langues :anglais 100% • ‎français 100%

Sommaire

Overview

Moab is a job scheduler developped by Adaptive Computing. The scheduler interacts with a resource manager (often Torque) in order to figure out which compute nodes are free and can run jobs. The scheduler then compute, according to a fairshare formula, which job will be run next.

Moab is used by Colosse and Guillimin, as well as many other Compute Canada supercomputers. Moab is also used by large computing centres around the world, such as Lawrence Livermore National Lab, who have created a good tutorial for their users.


Usage

As explained on the page Running jobs, users typically interact with Moab through a submission script.

Job submission: 'msub'

'msub' is the equivalent to 'qsub' in Torque. It accepts the same arguments, but offers some additional features. The list of arguments for 'msub' can be obtained from the man page or from the vendor's documentation.

All of the options for 'msub' can be specified by the lign

#PBS option

in the submission script, or directly from the command line

[name@server $]  msub option


Further information about important 'msub' options can be found in the following sub-sections.

Job name '-N'

Each job can be given a name, allowing users to distinguish between their jobs. Use a descriptive name to easily identify your jobs.

Account '-A'

Each job is associated with a particular account using the '-A' option, followed by the RAPI number of your group (in the form abc-123-aa).

Resources required '-l'

The resources required for a job are specified using the '-l' option. Several resources can be listed if separated by commas. For example, '-l nodes=32:ppn=2,pmem=1800mb,walltime=3600'.

Resource Description Example Notes
walltime Time required in seconds -l walltime=300 You can also specify the walltime using the format [[[dd]:[hh]:[mm]]:[ss]]. For example, for 48 hours : -l walltime=48:00:00.
nodes=x:ppn=y Number of nodes and processors (cores) per node. -l nodes=2:ppn=8
feature=xxxx Specific features for the node (see examples below). -l feature='48g'
procs=x Number of processors (cores) required. -l procs=32 Warning : with this option, you will not be able to control how your processes are distributed within the cluster.
gres=... Generic resources -l gres=matlab:1 This option is used by Moab to manage software licenses.

Specifying the location of output files

By default, the output files for a job are placed in the same directory where the job was created, although this is not always desired.

The path for the output file can be specified using the following options:

-o specify the path of the standard output stream (stdout) relative to the working directory. -e specify the path of the standard error stream (stderr) relative to the working directory.

For example:

#PBS -o $HOME/job.out
#PBS -e $HOME/job.err


Email notifications : -m and -M

The following options control email notifications sent to users regarding their submitted jobs:

  • -M: destination email address. Multiple email addresses can be specified, separated by commas.
  • -m: specify which events you would like to be notified of:
    • b: when the job begins
    • e: when the job ends
    • a: when the job is aborted

Example: send an email to user@address.ca when the job starts, finishes, or is aborted.

#PBS -M user@address.ca
#PBS -m bea

Submitting jobs in groups (job arrays)

Note: Job arrays are not available on all servers.

The options mentioned above can also be used to submit jobs in batches using job arrays. This is particularly useful when doing data parallelism (i.e. submitting the same application run with multiple data sets). The option "-t" of msub specifies how many instances start. Moab will then create a task with multiple sub-tasks, each task is responsible for one instance of the application. Several different data sets can be processed using the variable $MOAB_JOBARRAYINDEX. For example:


File : submit-myjobarray.sh
#!/bin/bash
#PBS -N MyJobArray
#PBS -A abc-123-aa
#PBS -l nodes=1:ppn=8
#PBS -l walltime=7200
#PBS -t [1-120]%10
/path/to/my/program /path/to/my/data/files/$MOAB_JOBARRAYINDEX


In the previous example, the lign "#PBS -t [1-120]%10" instructs Moab to create sub-jobs 1 to 120, and to run a maximum of 10 tasks simultaneously. The variable $MOAB_JOBARRAYINDEX contains the identification number of the sub-job and can be used to read different files for different sub-jobs.

To number the sub-jobs 2, 4, 6, 8, one would use the following syntax:

#PBS -t [2-8:2]

To give each sub-job a specific identification number, one would use the following syntax:

#PBS -t [4,7,8,15]

Please note that it is not possible to pass the $MOAB_JOBARRAYINDEX variable as an option to Moab (e.g. in the submission script in a line beginning with #PBS). The $MOAB_JOBARRAYINDEX variable is not created until after the 'msub' command has been executed.

If the options -e or -o are specified in a job array, the output file will contain the output of all the tasks in the list. This is usually not what is desired. It is possible to use wildcards to specify the task number as part of the output file name. The characters %J will be replaced by the number of the job, as returned by Moab upon submission while the characters %I will be replaced by the job number. For example, consider adding the following lines to the file submission:

#PBS -o MyJobArray-%I.out
#PBS -e MyJobArray-%I.err

In this case, our job will produce numbered files representing each sub-job in the job array: MyJobArray-1.err MyJobArray-1.out MyJobArray-2.err MyJobArray-2.out ...

Job Control : mjobctl

The mjobctl command allows users to modify their jobs after submission. It combines qdel or canceljob into a single command and also allows users to modify their jobs or place them on hold.

Cancelling a job: -c

To cancel a job, use the following command:

[nom@serveur $]  mjobctl -c <jobid>


This command asks the job to finish cleanly. The resource manager will send a SIGTERM signal followed by a SIGKILL to your task. Your task can intercept the SIGTERM, for example ,to save its state to disk before completion.

In case this command fails--for example if the node is unreachable--you can force the removal of the task via

[nom@serveur $] mjobctl -F <jobid>



Place or remove a job hold: -h et -u

You can place a job on hold or remove a hold from a job using the options -h (hold) and -u (unhold), respectively. For example,

[nom@serveur $] mjobctl -h <jobid>


will prevent the execution of a job, while

[nom@serveur $] mjobctl -u <jobid>


will return the job to the queue.

Control a group of jobs: -w

You can execute a command on a group of jobs with the option -w <ATTR>=<VAL>. ATTR can take on several values (see the documentation on mjobctl for the list). A common use of this option is to control all of your jobs. For example, to cancel all of your jobs, you can use the following command:

[nom@serveur $] mjobctl -c -w user=$USER


List jobs: showq

The showq command displays the current queue of jobs. There are several options available for this command. A few common ones are listed here:

Option Description
-r Display running jobs
-i Display idle jobs (resources not yet available)
-b Display blocked jobs
-c Display complete jobs
-w user=username Display jobs submitted by user with the given username
-help Display the available options
--blocking Normally, the showq command displays cached information for greater performance. This option ensures the results are synchronized with the Moab server.

The complete list of options can be found in the Moab documentation. Note that 'showq' displays cached information by default. Sometimes this information is not synchronized with the Moab server. To ensure synchronization use the "--blocking" option.

List of job states

État Description Explanation
R Running The job is currently executing.
S Starting The system has accepted the job and is currently running pre-start tasks.
C Complete The job has finished.

Diagnosing jobs: checkjob

The checkjob command displays lots of information about a job: the list of nodes being used, the resources consumed, environment variables, file names for error and output files, the submission script, etc. This command has two verbosity levels specified by the -v option. The command takes the jobid as an argument. For example:

[nom@serveur $] checkjob <jobid>


[nom@serveur $] checkjob -v <jobid>


[nom@serveur $] checkjob -v -v <jobid>


The list of nodes can also be obtained using the mjobctl command:

[nom@serveur $] mjobctl -q hostlist <jobid>


Environment variables

Moab defines a certain number of environment variables. To use them in the shell environment created for your job, you must add the -E option to the msub command, or specify the option in your submission script:

#PBS -E

On Colosse, this option is activated by default.

Nom de la variable Description Example
MOAB_JOBNAME Job name as specified by the -N option my_job
MOAB_USER Username of the user who submitted the job my_name
MOAB_TASKMAP List of nodes and number of cores reserved for the job r106-n3:8&r102-n3:8
MOAB_CLASS Name of the job class (queue name) short
MOAB_PROCCOUNT Number of processors requested for the job 16
MOAB_GROUP Unix group name for the user who submitted the job my_group
MOAB_NODELIST List of nodes reserved for the job r106-n3&r102-n3
MOAB_ACCOUNT Name of the account (project) the job is running under. The CPU time used for the task will be associated with this project. xxx-yyy-zz
MOAB_NODECOUNT Number of nodes reserved for the job 2
MOAB_JOBID Moab identification number of the job. If there is a sub-job number, it will be in square brackets. 10238401[84]
MOAB_JOBARRAYINDEX The sub-job number for a sub-job within a job array 84
MOAB_QOS Quality of service name assigned to the job Alloc

Server-specific instructions

Colosse

Required parameters

On Colosse, the parameters -N, -A, -l walltime= and -l nodes=x:ppn=8 are mandatory. In particular, note that Colosse requires your task to consume exactly 8 cores per node. In other words, Colosse does not permit partial-node scheduling. Submission files for Colosse always begin with:

File : submit_script.sh
#!/bin/bash
#PBS -N MyJob
#PBS -A abc-123-aa
#PBS -l walltime=300
#PBS -l nodes=1:ppn=8
cd "${PBS_O_WORKDIR}"


The "cd ${PBS_O_WORKDIR}" is required for the job to run in the directory where it was launched.

Default output and error files

On Colosse, the output and error files for a job are, by default, ${MOAB_JOBID}.out and ${MOAB_JOBID}.err in the directory where the job was launched. You can change these file names using the -o and -e options.

Submission queue

It is not necessary to specify a submission queue on Colosse. The submission queue is automatically determined as a function of the walltime, the number of nodes, and possibly other parameters.

The ckpt attribute

Certain nodes on Colosse are reserved for jobs which use checkpointing (i.e. jobs which periodically save their state and can be resumed at a later time). When compute nodes are returned to production after maintenance or repair, the Colosse team thoroughly tests them to make sure they are stable. However, the testing is limited by time and our imagination and we must eventually decide to put the node into production. In order to make these nodes available as quickly as possible, we offer these nodes for tasks that perform checkpointing. If your job uses checkpointing, you may have access to hundreds of additional cores. Because these nodes have been extensively tested (memory, cpu and networking), the likelihood of failure is low.

To use these nodes, add the following line to your submission file:

#PBS -l gattr=ckpt

Instructions for specific nodes

Colosse has certain nodes with 48GB of RAM available associated with the feature '48g'. To request these nodes, add the following line to your submission script:

#PBS -l feature='48g'

colosse-info

On Colosse, the command colosse-info displays information on your group's usage and the overall usage of the system. For example,

[nom@serveur $] colosse-info 
 Your Rap IDs are: bwg-974-aa colosse-users exx-883-aa exx-883-ab
Total number of jobs currently running: 385
Total number of slots in use: 6848/7216 (94.00%). Reserved for checkpointing jobs: 496
You are currently using 0 slots for 0 job(s).
RAPI bwg-974-aa: 0 used cores / 10 allocated cores (recent history)
RAPI colosse-users: 5.73178 used cores / 77 allocated cores (recent history)
RAPI exx-883-aa: 0.00397147 used cores / 10 allocated cores (recent history)
RAPI exx-883-ab: 0 used cores / 10 allocated cores (recent history)


Frequently asked questions

See the page FAQ Moab Colosse, specific to Colosse.

Guillimin


References

Outils personnels
Espaces de noms

Variantes
Actions
Navigation
Ressources de Calcul Québec
Outils
Partager