Planned Emergency Maintenance: August 1st, 2022
July 25, 2022
NREL's Eagle cluster will be taking a planned emergency outage beginning at 7:00 A.M.
on Monday, August 1, 2022.
During this outage, we will be rebalancing the Lustre Object Storage Targets (OST) to their Object Storage Servers (OSS) and recreating the High Availability (HA) for the Lustre file system.
During the recent campus power outage and system time, we lost 2 controllers which run the Object Storage Servers. We were able to repair one controller at the time, but the other ran into an issue during replacement with the mirrored disks. We have now replaced the internal disks and the controller. In order to put the controller back into operation and rebuild the HA we need to unmount all the Lustre clients, including login nodes.
Maintenance is expected to last for several hours, and we plan for a return to service the same business day. Logins will be disabled and all data in /projects, /shared-projects, /scratch, and /datasets will be unavailable during the outage.
We apologize for the inconvenient timing of this outage, but the critical maintenance being performed should improve the performance and stability of Eagle and the Lustre filesystem.
NREL HPC Operations Team