Slurm Fairshare Refresher

May 7, 2021

FY21 saw the introduction of the "fairshare" priority algorithm in Eagle's job scheduler, Slurm. Queue times have been high during the Q2-Q3 rush and we've received some questions, so here's a quick refresher on Fairshare and what it means in regards to job scheduling.

The fairshare algorithm is a part of the Slurm "multi-factor priority" plugin that determines when a job should run. This algorithm is designed to help moderate queue usage by promoting jobs from under-utilized allocations, while over-utilized allocations get shifted towards CPU time that would otherwise be idle. The base fairshare value for an allocation is determined by the number of AUs allocated to a project, and is currently re-calculated on a quarterly basis. Every job that runs will affect the fairshare value, reducing the priority of future jobs. Larger jobs will have a larger impact, running smaller jobs will have less of an impact. The effects of any job on fairshare value will reduce by half every two weeks. And most importantly, fairshare only accounts for about half of job priority calculations--the rest relies on other factors, including the job's size, QOS setting, and partition.