Eagle Power Maintenance Complete

Dec. 8, 2022

Electrical upgrades to the ESIF data center to prepare for Kestrel were successful, and power has been restored. Cluster maintenance and updates were also performed once the electrical work was complete, including the following changes:

A new limit of 10,000 pending jobs per user was added to the job scheduler.

The scheduler will now manage job out-of-memory errors better, with improved memory limit enforcement, to help maintain node health and make oom-kill job failures more apparent to job owners.

Routine software and firmware updates were also applied to the storage system and network equipment.

Eagle's job queue has been resumed. Eagle and supporting systems (login and DAV nodes, storage, and Globus) have returned to service and are available for use.

Please note that Eagle's filesystem has reached a critical 91% capacity. As we return to service, please help the Eagle community by removing data that is no longer necessary to your work.

The Swift cluster has returned to service and is now available for use.

Vermilion is not yet available. Users of this system will receive a separate announcement when it has returned to service.


Thank You,

NREL HPC Operations Team