Torque

De Wiki de Calcul Québec
Aller à : Navigation, rechercher
Cette page est une traduction de la page Torque et la traduction est complétée à 100 % et à jour.

Autres langues :anglais 100% • ‎français 100%

Torque is an opensource job submission software maintained by Adaptive Computing. It is one implementation of PBS (Portable Batch System) derived from OpenPBS. It interacts with a scheduler (usually Maui or Moab) which manages the local scheduling policies, i.e. which job is next to run. All of Calcul Québec's servers use Torque.

Sommaire

Operation principle

Torque works on a server-client model. It is made of many parts:

  • "pbs_server", the server, which is always running to deal with client requests;
  • the scheduler listed above, which is also always running;
  • "pbs_mom", which is running on the compute nodes. It follows the server's instructions and deals with starting and ending the jobs, cleaning the nodes after a job, and manages the resources used by the jobs;
  • the client commands, which send requests to the server.

Main commands

From most to least important commands from a user's perspective, the commands are:

qsub 
submits a job
qstat 
displays the list of jobs
qselect 
lists the ID of jobs matching some research criteria
qdel 
deletes an idle or running job
qalter 
modifies an idle job
qmove 
moves one or many jobs toward a different queue
pbsnodes 
lists the nodes and their features and resources
qhold 
puts a job on hold
qrls 
release a hold
qsig 
sends a signal to a jo
qrerun 
kills a job and puts it back in queue
qchkpt 
sends a checkpoint signal to the job
pbsdsh 
runs many instances of a command within a Torque job
tracejob 
displays data about a job from the Torque journal files
momctl 
manages and diagnoses torque daemons on the compute node (pbs_mom) (requires administrator privileges)
qmgr 
manages local policies of Torque (requires administrator privileges)
qrun 
starts a job immediately (requires administrator privileges)
qterm 
stops the Torque server (pbs_server) (requires administrator privileges)

Environment variables

Torque adds some variables to the job's environment. You may use those variables within your submission script. For example, the environment variables HOME, LANG, LOGNAME, PATH, MAIL, SHELL and TZ as defined on the submit host become PBS_O_HOME, PBS_O_LANG, PBS_O_LOGNAME, PBS_O_PATH, PBS_O_MAIL, PBS_O_SHELL and PBS_O_TZ, respectively, such as not to conflict with those environment variables on the compute nodes. The following table lists other variables defined by Torque according to the documentation:

Name of the variable Description Example
PBS_ENVIRONMENT One of either "PBS_INTERACTIVE" or "PBS_BATCH" depending on the presence or absence of -I flag at submission. PBS_BATCH
PBS_JOBID Job ID 12345.egeon2
PBS_JOBNAME Name of the job such as defined with the "-N" option my_job
PBS_NODEFILE Name of a file containing the list of nodes allocated to the job. Each node appears as many times as the number of cores allocated for the job on that node. /var/spool/pbs/aux/12345.egeon2
PBS_O_HOST Name of the host where qsub was run colosse1
PBS_O_QUEUE Name of the queue in which the job has been submitted short
PBS_O_WORKDIR Directory from which the job was submitted /home/user/scripts_pbs
PBS_QUEUE Name of the queue in which the job is running qtest
PBS_SERVER Name of the host on which the Torque server (pbs_server) runs egeon2
PBS_VERSION Version of PBS TORQUE-2.5.3

Torque also defines the following undocumented variables:

Name of the variable Description Example
PBS_JOBCOOKIE 05C7098F5411BD5042ED0FAC7B2D139A
PBS_MOMPORT Port number used by "pbs_mom". 15003
PBS_NODENUM Node number (example with « -l nodes=20:ppn=8 »);

available for processes launched with pbsdsh and mpiexec

0 à 19
PBS_NUM_NODES Number of nodes requested by the job (example with « -l nodes=20:ppn=8 »);

Does not work on Cottos

20
PBS_NUM_PPN Number of cores per nodes requested by the job (example with « -l nodes=20:ppn=8 »);

Does not work on Cottos

8
PBS_TASKNUM Task number for processes launched with pbsdsh et mpiexec 1
PBS_VNODENUM Virtual node number (example with « -l nodes=20:ppn=8 »);

for processes launched with pbsdsh et mpiexec

0 à 159

Other references

Running jobs on Calcul Québec's supercomputers

Example script documenting qsub's options

Example scripts for multiple serial simulations, OpenMP jobs, and hybrid jobs

Documentation website from Adaptive Computing

Complete Torque documentation

Appendices of the Torque documentation

Overview of Torque commands

Outils personnels
Espaces de noms

Variantes
Actions
Navigation
Ressources de Calcul Québec
Outils
Partager