Running Batch Jobs on Peregrine
Batch jobs are run on Peregrine by submitting a job script to the scheduler. The script contains the commands needed to set up your environment and run your application.
To submit jobs on Peregrine, the Torque qsub command should be used:
% qsub <batch_file> -A <project-handle>
The job script may contain instructions for the job scheduler. These are preceeded with "#PBS". These may be used to specify resource limits such as wall clock time, number of nodes, etc. as well as what queue and what kind of nodes you want your job to run on. The same options may be provided as options to the qsub command. If so, command line options to qsub take precedence over similar options in the batch script.
All jobs must specify a project handle that is associated with a project allocation. This may be done with the -A option to qsub or as an option within the script. Jobs submitted without a project handle will be rejected. This prevents jobs with an incorrect spelling of the project handle from sitting in the queue indefinitely.
Program executable files may reside in any file system but input and output files should be read from or written to the /scratch file system.
|option||what it does|
|-d execution directory||tells Torque where the job should execute|
|-I (capital i)||interactive job|
|-X||display X windows to my system (undocumented flag)|
|-l (lowercase L)||resource limits (see below for more information)|
|-V||export environment variables to batch job|
|-j eo||join stderr and stdout in one file|
|-A||tells Torque what project allocation should be charged for the job's node-hour usage|
|-q queue_name||submit job to a specific queue|
A variety of environment variables are made available for use by your script.
- The environment variable $PBS_O_WORKDIR is set to the location the job was submitted from.
- $PBS_NODEFILE points to a file containing a list of nodes allocated to the job.
Resource limits, such as the number of nodes and the wall clock time, may be set with the -l option to qsub or #PBS -l in the job script.
-l nodes=n:ppn=X says the job needs n nodes and should place X processes on each node
-l walltime=DD:HH:MM:SS sets the wall clock limit for job
Peregrine has several types of compute nodes, which differ in the amount of memory and number of processor cores. The majority of the nodes have 24 Xeon cores and 32 GB of memory but some have 24 Xeon cores and 64 GB of memory, some have 16 Xeon cores with 32 GB of memory and others have 16 Xeon cores with 256 GB of memory.
Users may request nodes of a particular type using the "feature" option in the resource limit specification of the job. By default, jobs will use the first node type found that is consistent with the job request.
If you request an inconsistent feature set (e.g., 16-core and 64GB), the job will not be scheduled and will remain queued indefinitely. If such a job is seen, an admin will attempt to contact the job owner to rectify the incompatibility and get the job running.
More information about requesting different node types in Peregrine is available.