Node Use Efficiency
August 21, 2019
When building batch scripts it is advisable to first become familiar with the capabilities offered by the Eagle nodes. In creating your batch scripts, please keep in mind the memory capacities of the nodes, the type of cores available and to be aware that running multiple tasks on each node or the use of job arrays may assist in using your node hours more effectively. Further, assign the memory requirement for proper node type and process management based on the capability of differing nodes: https://www.nrel.gov/hpc/eagle-batch-jobs.html.
Some Slurm options that you might consider are:
--ntasks=<number> - When used with sbatch, this option advises the Slurm controller that job steps run within the allocation will launch a maximum of number tasks and to provide for sufficient resources. The default is one task per node, but note that the --cpus-per-task option will change this default. When used within a job allocation, this option will specify the number of tasks to run per step.
--tasks-per-node=<n> - Specify the number of tasks to be launched per node.
--ntasks-per-node=<ntasks> - Request that ntasks be invoked on each node. If used with the --ntasks option, the --ntasks option will take precedence and the --ntasks-per-node will be treated as a maximum count of tasks per node. Meant to be used with the --nodes option. This is related to --cpus-per-task=ncpus, but does not require knowledge of the actual number of cpus on each node. In some cases, it is more convenient to be able to request that no more than a specific number of tasks be invoked on each node.
--mem=<size[units]> - Specify the real memory required per node. Default units are megabytes.
--cpus-per-task=<ncpus> - Request that ncpus be allocated per process. This may be useful if the job is multithreaded and requires more than one CPU per task for optimal performance. The default is one CPU per process.
--exclusive - This option can be used when initiating more than one job step within an existing resource allocation, where you want separate processors to be dedicated to each job step. If sufficient processors are not available to initiate the job step, it will be deferred. This can be thought of as providing a mechanism for resource management to the job within its allocation.
To determine the level of resources that your intended job may require it may be advisable to create a benchmark using a debug node and an example of a job you want to run to get a measure. Use that 'measure' to determine how to more efficiently use the nodes capabilities. https://www.nrel.gov/hpc/eagle-job-partitions-scheduling.html. Remember that input and output files should be read from or written to the /scratch file system mount.