System Resource Allocation Units
NREL uses an allocation unit (AU) to allocate and charge time used on its high-performance computing (HPC) systems.
Basically, NREL allocates time on compute nodes and space on its file systems and MSS (archive) system. One AU is one-third of a node-hour on Eagle if a job is set at regular priority. For comparison with other HPC systems, each node has a theoretical peak performance of 3,456 GigaFLOPS. On Eagle, each node has 36 cores. However, users request time in node hours instead of core hours.
Computing Allocation Unit Charges
AU charges are calculated for each job run upon job completion. The cost of the job is computed using this formula:
Walltime in hours * Number of Nodes * QoS Factor * Charge Factor
Quality of Service Factor
The quality of service (QoS) factor reflects the priority given to a job. The default QoS factor for all jobs is 1, meaning the job is run at normal priority. A user can change the job to run at high priority by adding “--qos=high” when the job is submitted. Setting the job to high-priority will give the job a boost, so it runs sooner than other jobs in the queue. High-priority jobs have a QoS factor of 2. This means they will be charged at twice the normal rate.
The conversion from node hours to AUs is the charge factor. For Eagle, the charge factor is 3, consistent with one AU being one-third of a node hour.
Users are charged for an entire node, even if they only use one core on the node at a time. Users whose codes are not parallelized are encouraged to run arrays of jobs on one node, both to minimize their own use of AUs, and to keep Eagle nodes free for other users.
Estimating Allocation Units for Allocation Requests
When possible, running jobs on NREL HPC systems prior to making an annual allocation request is encouraged, both to make sure the code is ready to go, and to run test jobs. You can request a pilot allocation at any time.
The basic formula for estimating AUs is:
Per-Job walltime in hours * Number of Nodes * Charge Factor * Number of Runs anticipated
Reasoning Behind Allocation Units
Giving out allocations in AUs, instead of node hours or core hours, lets NREL keep the definition of an allocation consistent as we move from HPC system to HPC system. This is a best-practice that NREL borrowed from other HPC facilities in the U.S. Department of Energy system.
For fiscal year 2019, NREL users had both Eagle and Peregrine to choose from, and using AUs let users shift between the two machines easily. At some point, when Eagle is replaced, we expect to have access to two machines at the same time again. When this happens, NREL will assign a new charge factor to the new machine.
AUs are indexed to a node-hour on Peregrine—the first system deployed at the HPC User Facility within the Energy Systems Integration Facility. A Peregrine node (for purposes of the AU standard) was a 24-core Intel Xeon (Haswell) node, which had a theoretical peak performance of 883.2 GigaFLOPS. Eagle nodes have a theoretical peak performance of 3,456 GigaFLOPS—3.91x times Peregrine.
A throughput metric establishes the overall performance of a new system relative to the previous system. For Eagle, the throughput metric involved running HPGMG, Nalu, and VASP benchmarks in proportion to a 12-month average of Peregrine workload. The result of the throughput benchmark test established that Eagle could run about 2.69x the workload of Peregrine per unit time.
A charge factor of 3 was selected as a simple integer between the throughput and theoretical performance ratios.