May 7, 2021
The allocation year transitioned from Q2 to Q3 on April 1st. The job queue leading up to the end of Q2 saw a very large spike in jobs submitted, and queue depth (job wait time) rose accordingly. A few projects saw some effect of fairshare, but much of the pressure came from over a third of all jobs being submitted as qos=high. Because of the large surge in jobs submitted, interactions with fairshare and a few projects that have used up their allocation we have been analyzing the scheduling algorithms. Based on some recommendations from SchedMD and internal analysis we have made a few adjustments to the slurm configuration. Those changes thus far appear to have alleviated some of the pressure on the queues as well as a reduction in the number of jobs submitted with qos=high.