Eagle Job Partitions and Scheduling Policies

Learn about job partitions and policies for scheduling jobs on Eagle.

Partitions

Eagle nodes are associated with one or more partitions.  Each partition is associated with one or more job characteristics, which include run time, per-node memory requirements, per-node local scratch disk requirements, and whether graphics processing units (GPUs) are needed.

Jobs will be automatically routed to the appropriate partitions by Slurm based on node quantity, walltime, hardware features, and other aspects specified in the submission. Jobs will have access to the largest number of nodes, thus shortest wait, if the partition is not specified during job submission.

The following table summarizes the partitions on Eagle.

Partition Name Description Limits Placement Condition

debug

Nodes dedicated to developing and troubleshooting jobs. Debug nodes with each of the non-standard hardware configurations are available.

The node-type distribution is:

  • 4 GPU nodes
  • 2 Bigmem nodes
  • 7 standard nodes
  • 13 total nodes

1 job with a max of 2 nodes per user

01:00:00 max walltime

-p debug
or
--partition=debug

short Nodes that prefer jobs with walltimes <= 4 hours

No partition limit.

No limit per user.

--time <= 4:00:00

--mem <= 85248 (1800 nodes)

--mem <= 180224 (720 nodes)

standard Nodes that prefer jobs with walltimes <= 2 days

2100 nodes total

1050 nodes per user

--time <= 2-00

--mem <= 85248 (1800 nodes)

--mem <= 180224 (720 nodes)

long

Nodes that prefer jobs with walltimes > 2 days

Maximum walltime of any job is 10 days

525 nodes total

262 nodes per user

--time <= 10-00

--mem <= 85248 (1800 nodes)

--mem <= 180224 (720 nodes)

bigmem Nodes that have 768 GB of RAM

90 nodes total

45 nodes per user

--mem > 180224

bigscratch Nodes that each have larger /tmp/scratch mounts (24 TB SSD) for per-node large-data tasks

20 nodes total

10 nodes per user

--tmp > 1500000

gpu

Nodes with dual NVIDIA Tesla V100 PCIe 16 GB Computational Accelerators for GPU-based software

44 nodes total

22 nodes per user

2 GPUs per node

--gres=gpu:1 (1 per node)

--gres=gpu:2 (2 per node)

--timelimit <= 2 days

gpul

Nodes with dual NVIDIA Tesla V100 PCIe 16GB Computational Accelerators for GPU-based software

8 nodes

2 node per user

2 GPUs per node

--gres=gpu:1 (1 per node)

--gres=gpu:2 (2 per node)

--timelimit > 2 days

Use the option listed above on the srun, sbatch, or salloc command or in your job script to specify what resources your job requires.  Sample job scripts and the syntax for specifying the queue are available on the sample job scripts page.

Job Scheduling Policies

The system configuration lists the four categories that Eagle nodes exhibit based on their hardware features. No single user can have jobs running on more than half of the nodes from each hardware category. For example, the maximum quantity of data and analysis visualization (DAV) nodes a single job can use is 25.

Also learn how jobs are prioritized.


Share