Running Batch Jobs on Eagle

Batch jobs are run on Eagle by submitting a job script to the scheduler. The script contains the commands needed to set up your environment and run your application.

To submit jobs on Eagle, the Slurm sbatch command should be used:

$ sbatch --account=<project-handle> <batch_script>

Scripts and program executables may reside in any file system, but input and output files should be read from or written to the /scratch file system mount. /scratch uses the Lustre filesystem which is designed to utilize the parallelized networking fabric that exists between Eagle nodes, and will result in much higher throughput on file manipulations.

Arguments to sbatch may be used to specify resource limits such as job duration (referred to as "walltime"), number of nodes, etc., as well as what hardware features you want your job to run with. These can also be supplied within the script itself by placing #SBATCH comment directives within the file.

For examples of implementations, please see our sample batch scripts.

Users familiar with job submissions to PBS on Peregrine may be interested in our PBS to Slurm Translation Sheet for quickly converting workflows over to Eagle. Also see the official Slurm Cheat Sheet produced by SchedMD, the developers of the Slurm.

Note: Command line arguments must precede the batch executable or they will be ignored. Duplicate arguments supplied via command line take precedence over #SBATCH directives.

Job Submission Requirements

Parameter	How Specified	Example	Additional Information
Project Handle	--account or -A	--account=<project_handle> or -A <project_handle>	Project handles are provided by HPC Operations at the beginning of an allocation cycle.
Maximum Job Duration (Walltime)	--time or -t	--time=1-12:05:30 or -t5 (1-12:05:30 = 1day 12hours 5min 30sec)	Recognized Time Formats: <days>-<hours> <days>-<hours>:<min> <days>-<hours>:<min>:<sec> <hours>:<min>:<sec> <min>:<sec> <min>

Resource Request Descriptions

Parameter	How Specified	Example	Additional Information
Nodes / Tasks / MPI Ranks	--nodes or -N --ntasks or -n --ntasks-per-node	--nodes=2 --ntasks=40 --ntasks-per-node=20	If ntasks is specified it is still important to indicate the number of nodes to be requested. This helps with scheduling jobs on the fewest possible number of Ecells required for that job. The maximum number of tasks that can be assigned per node is 36. Note: the --tasks flag is not mentioned in official documentation, but exists as an alias for --ntasks-per-node
Memory	--mem --mem-per-cpu	--mem=50000 or --mem 700GB	memory per node memory per task / MPI rank
Local Disk - /tmp/scratch	--tmp	--tmp=20TB --tmp=1200GB --tmp 1000000	Request /tmp/scratch space in megabytes (default), GB, or TB. *1000000MB = 1TB
GPUs	--gres	--gres=gpu:2	There are 44 nodes in the queue, each with 2 NVidia Tesla V100 GPUs.
Licenses Planned functionality	--licenses	--licenses=COMSOL:1	Use scontrol show lic to display what licenses are available. This functionality is not yet set up on Eagle. Implementation may differ.

Job Management and Output

Parameter	How Specified	Example	Additional Information
High Priority	--qos	--qos=high	Jobs that request qos=high will generally jump to the top of the queue. Time used will charge 2x normal rate.
Job Dependencies	--dependency	--dependency=\ <condition>:<job_id> Conditions: after afterany afternotok afterok singleton	You can submit jobs that will wait until a condition is met before running. When job runs due to dependency conditions: after the listed jobs have started after the listed jobs have finished after the listed jobs have failed after the listed jobs return exit code 0 after all existing jobs with the same name and user have ended.
Job Name	--job-name	--job-name=myjob	Will help recognize a particular job in the queue to differentiate it from other jobs.
Email Notification	--mail-user	--mail-user=\ my.email@some.org --mail-type=ALL	Email address. Slurm will send updates on state change. Type specifies which state will generate an email update. Type may be: BEGIN, END, FAIL, or ALL.
Output	--output --error	--output=job_stdout --output=job_stderr	Defaults to slurm-<jobid>.out Defaults to slurm-<jobid>.out (same file as stdout) stderr will be written to same file as stdout unless specified otherwise

Commonly Used Slurm Environment Variables

Environment Variable	Semantic Meaning	Sample Value
$LOCAL_SCRATCH	Absolute directory path for large disk space per node (nodes will not have access to each other's local scratch). This should always be /tmp/scratch across all Eagle nodes	/tmp/scratch
$SLURM_CLUSTER_NAME	Identical to $NREL_CLUSTER on our systems, but is the name given to the cluster in the configuration of the master Slurm daemon	eagle
$SLURM_CPUS_ON_NODE	We have disabled hyperthreading, so this should always be 36 CPUs per node across all of Eagle.	36
$SLURMD_NODENAME	The name (as configured in Slurm) of the individual node evaluating the variable, which should match the hostname on our systems.	r4i2n3
$SLURM_JOB_ACCOUNT	The account used to submit the job, which should be your project's handle.	csc000
$SLURM_JOB_CPUS_PER_NODE	If a value for --cpus-per-node is specified for the job, this will reflect that. This should be <= 36	36
$SLURM_JOB_ID or $SLURM_JOBID	The job ID assigned to the job, so you can identify output related to just that job.	521837
$SLURM_JOB_NAME	If the job was named, this will hold that name, otherwise it will default to the command that was run in the job.	sh
$SLURM_JOB_NODELIST or $SLURM_NODELIST	This contains the hostnames of the nodes that participated in your job. There will be at least one, and is often given in Slurm's range syntax for contiguous nodes, like the sample value.	r4i2n[2-6]
$SLURM_JOB_NUM_NODES or $SLURM_NNODES	The quantity of nodes that were requested for the job.	5
$SLURM_JOB_PARTITION	Contains what partition(s) the job was placed in.	short
$SLURM_JOB_QOS	Contains the Quality of Service specified by the job.	high
$SLURM_JOB_USER	Contains the username of the user who submitted the job.	ttester
$SLURM_NODEID	Per each individual node in the job, this represents its unique index value in the list of nodes. This should be a value between 0 and the node quantity requested.	0
$SLURM_STEP_ID or $SLURM_STEPID	From within a job, sequential srun commands are called job "steps." Each call to srun increments this variable, giving each step its own unique ID index. This may be helpful for debugging, for example seeing which step the job fails at.	0
$SLURM_STEP_NODELIST	From within a job, calls to srun can contain differing specifications of how many nodes should be used for the step. If your job called for 5 nodes, and you used srun -N 3, this variable would contain the list of the 3 nodes from the 5 allocated to the job that participated in this job step.	r4i2n[2-4]
$SLURM_STEP_NUM_NODES	Like above, except this will contain the quantity of nodes requested for the job step.	3
$SLURM_STEP_NUM_TASKS	Contains the quantity of tasks requested to be executed in the job step, defaults to task number of the job request.	1
$SLURM_STEP_TASKS_PER_NODE	Contains the value specified by --tasks-per-node in the job step, defaults to the tasks-per-node of the job request.	1
$SLURM_SUBMIT_DIR	Contains the absolute path of the directory the job was submitted from.	/projects/csc000
$SLURM_SUBMIT_HOST	Contains the hostname of the system the job was submitted from. Should always be an Eagle login node or an Eagle DAV node.	el1
$SLURM_TASKS_PER_NODE	Contains the value specified by --tasks-per-node in the job request.	1