Running Batch Jobs on Eagle

Batch jobs are run on Eagle by submitting a job script to the scheduler. The script contains the commands needed to set up your environment and run your application.

To submit jobs on Eagle, the Slurm sbatch command should be used:

$ sbatch --account=<project-handle> <batch_script>

Scripts and program executables may reside in any file system, but input and output files should be read from or written to the /scratch file system mount. /scratch uses the Lustre filesystem which is designed to utilize the parallelized networking fabric that exists between Eagle nodes, and will result in much higher throughput on file manipulations.

Arguments to sbatch may be used to specify resource limits such as job duration (referred to as "walltime"), number of nodes, etc., as well as what hardware features you want your job to run with. These can also be supplied within the script itself by placing #SBATCH comment directives within the file.

For examples of implementations, please see our sample batch scripts.

Users familiar with job submissions to PBS on Peregrine may be interested in our PBS to Slurm Translation Sheet for quickly converting workflows over to Eagle. Also see the official Slurm Cheat Sheet produced by SchedMD, the developers of the Slurm.

Note: Command line arguments must precede the batch executable or they will be ignored. Duplicate arguments supplied via command line take precedence over #SBATCH directives.

Job Submission Requirements

Parameter How Specified Example Additional Information
Project Handle --account
or
-A

--account=<project_handle>
or
-A <project_handle>

Project handles are provided by HPC Operations at the beginning of an allocation cycle.
Maximum Job Duration
(Walltime)
--time
or
-t

--time=1-12:05:30
or
-t5

(1-12:05:30 = 1day 12hours 5min 30sec)

Recognized Time Formats:

  • <days>-<hours>
  • <days>-<hours>:<min>
  • <days>-<hours>:<min>:<sec>
  • <hours>:<min>:<sec>
  • <min>:<sec>
  • <min>

Resource Request Descriptions

Parameter How Specified Example Additional Information

Nodes / Tasks / MPI Ranks

--nodes or -N

--ntasks or -n

--ntasks-per-node

--nodes=2

--ntasks=40

--ntasks-per-node=20

If ntasks is specified it is still important to indicate the number of nodes to be requested. This helps with scheduling jobs on the fewest possible number of Ecells required for that job. 

The maximum number of tasks that can be assigned per node is 36.

Note: the --tasks flag is not mentioned in official documentation, but exists as an alias for --ntasks-per-node

Memory

--mem

--mem-per-cpu

--mem=50000 or --mem 700GB

memory per node

memory per task / MPI rank

Local Disk - /tmp/scratch --tmp

--tmp=20TB

--tmp=1200GB

--tmp 1000000

Request /tmp/scratch space in megabytes (default), GB, or TB.

*1000000MB = 1TB

GPUs --gres --gres=gpu:2 There are 44 nodes in the queue, each with 2 NVidia Tesla V100 GPUs.

Licenses

Planned functionality

--licenses --licenses=COMSOL:1

 Use scontrol show lic to display what licenses are available.

This functionality is not yet set up on Eagle. Implementation may differ.

Job Management and Output

Parameter How Specified Example Additional Information
High Priority --qos --qos=high Jobs that request qos=high will generally jump to the top of the queue.  Time used will charge 2x normal rate.
Job Dependencies --dependency

--dependency=\
<condition>:<job_id>

 

 Conditions:

  • after
  • afterany
  • afternotok
  • afterok
  • singleton

You can submit jobs that will wait until a condition is met before running.

 

When job runs due to dependency conditions:

  • after the listed jobs have started
  • after the listed jobs have finished
  • after the listed jobs have failed
  • after the listed jobs return exit code 0
  • after all existing jobs with the same name and user have ended.
Job Name --job-name --job-name=myjob  Will help recognize a particular job in the queue to differentiate it from other jobs.
Email Notification

 --mail-user

 

 

--mail-user=\
my.email@some.org

--mail-type=ALL

Email address. Slurm will send updates on state change.  Type specifies which state will generate an email update. Type may be: BEGIN, END, FAIL, or ALL.
Output

--output

--error

--output=job_stdout

--output=job_stderr

Defaults to slurm-<jobid>.out

Defaults to slurm-<jobid>.out (same file as stdout)

stderr will be written to same file as stdout unless specified otherwise

Commonly Used Slurm Environment Variables

Environment Variable Semantic Meaning Sample Value
$LOCAL_SCRATCH Absolute directory path for large disk space per node (nodes will not have access to each other's local scratch). This should always be /tmp/scratch across all Eagle nodes /tmp/scratch
$SLURM_CLUSTER_NAME Identical to $NREL_CLUSTER on our systems, but is the name given to the cluster in the configuration of the master Slurm daemon eagle
$SLURM_CPUS_ON_NODE We have disabled hyperthreading, so this should always be 36 CPUs per node across all of Eagle. 36
$SLURMD_NODENAME The name (as configured in Slurm) of the individual node evaluating the variable, which should match the hostname on our systems. r4i2n3
$SLURM_JOB_ACCOUNT The account used to submit the job, which should be your project's handle. csc000
$SLURM_JOB_CPUS_PER_NODE If a value for --cpus-per-node is specified for the job, this will reflect that. This should be <= 36 36
$SLURM_JOB_ID
or
$SLURM_JOBID
The job ID assigned to the job, so you can identify output related to just that job. 521837
$SLURM_JOB_NAME If the job was named, this will hold that name, otherwise it will default to the command that was run in the job. sh
$SLURM_JOB_NODELIST
or
$SLURM_NODELIST
This contains the hostnames of the nodes that participated in your job. There will be at least one, and is often given in Slurm's range syntax for contiguous nodes, like the sample value. r4i2n[2-6]
$SLURM_JOB_NUM_NODES
or
$SLURM_NNODES
The quantity of nodes that were requested for the job. 5
$SLURM_JOB_PARTITION Contains what partition(s) the job was placed in. short
$SLURM_JOB_QOS Contains the Quality of Service specified by the job. high
$SLURM_JOB_USER Contains the username of the user who submitted the job. ttester
$SLURM_NODEID Per each individual node in the job, this represents its unique index value in the list of nodes. This should be a value between 0 and the node quantity requested. 0
$SLURM_STEP_ID
or
$SLURM_STEPID
From within a job, sequential srun commands are called job "steps." Each call to srun increments this variable, giving each step its own unique ID index. This may be helpful for debugging, for example seeing which step the job fails at. 0
$SLURM_STEP_NODELIST From within a job, calls to srun can contain differing specifications of how many nodes should be used for the step. If your job called for 5 nodes, and you used srun -N 3, this variable would contain the list of the 3 nodes from the 5 allocated to the job that participated in this job step. r4i2n[2-4]
$SLURM_STEP_NUM_NODES Like above, except this will contain the quantity of nodes requested for the job step. 3
$SLURM_STEP_NUM_TASKS Contains the quantity of tasks requested to be executed in the job step, defaults to task number of the job request. 1
$SLURM_STEP_TASKS_PER_NODE Contains the value specified by --tasks-per-node in the job step, defaults to the tasks-per-node of the job request. 1
$SLURM_SUBMIT_DIR Contains the absolute path of the directory the job was submitted from. /projects/csc000
$SLURM_SUBMIT_HOST Contains the hostname of the system the job was submitted from. Should always be an Eagle login node or an Eagle DAV node. el1
$SLURM_TASKS_PER_NODE Contains the value specified by --tasks-per-node in the job request. 1
     

Share