There have been system slowdowns on Eagle recently due to users launching large jobs out of the /home filesystem, particularly jobs calling large conda environments. Please avoid launching large jobs from /home and consider moving your conda environments to your /projects directory before launching a multi-node job. The /home filesystem is not designed for the same high level of input/output operations that the Lustre-based /projects filesystem is built for. The slowdown that can result from a large job in /home can dramatically increase the job runtime, costing AU and possibly causing timeout failures, as well as impact other users and their jobs with system slowdowns, timeouts, and module loading errors.
Announcements
Read announcements for NREL’s high-performance computing (HPC) system users.
Workaround for Windows SSH Users
May 4, 2022
Some people who use Windows 10/11 computers to ssh to Eagle from a Windows command prompt, powershell, or via Visual Studio Code's SSH extension have received a new error message about a "Corrupted MAC on input" or "message authentication code incorrect." Here's how to fix this issue.
Eagle login node etiquette
April 7, 2022
Eagle logins are shared resources that are heavily utilized. We have some controls in place to limit per user process use of memory and CPU that will ramp down your processes usage over time. We recommend any sustained heavy usage of memory and CPU take place on compute nodes, where these limits aren't in place. If you only need a node for an hour, nodes in the debug partition are available. We permit compiles and file operations on the logins, but discourage multi threaded operations or long, sustained operations against the file system. We cannot put the same limits on file system operations as memory and CPU, therefore if you slow the file system on the login node, you slow it for everyone on that login. Lastly, Fastx, the remote windowing package on the ED nodes, is a licensed product. When you are done using FastX, please log all the way out to ensure licenses are available for all users.
Changes to Slurm "srun" for Interactive Jobs
Feb. 3, 2022
During the recent system time the Slurm job scheduler was upgraded. One of the side effects of this was a change in the way Slurm handles job steps internally in certain cases.
Slurm Fairshare Refresher
May 7, 2021
FY21 saw the introduction of the "fairshare" priority algorithm in Eagle's job scheduler, Slurm. Queue times have been high during the Q2-Q3 rush and we've received some questions, so here's a quick refresher on Fairshare and what it means in regards to job scheduling.
Elevate your work with new tracking for Advanced Computing in the NREL Publishing Tracker
March 3, 2021
There is a new question on the User Facilities & Program Areas page when you enter a publication into the Pub Tracker – “The High Performance Computing Facility was used to produce results or data used in this publication.” Please be sure to check Yes on this question for your work that made use of the HPC User Facility or other systems in the ESIF HPC Data Center. In addition, there are three new Program Areas to use to tag your publication under the Advanced Computing heading: Cloud, HPC and Visualization & Insight Center. Making use of these metadata will enable us to elevate your work through communications highlights, feature stories, and reporting to EERE.
More information about the NREL Publishing Tracker can be found by visiting the Access and Use the NREL Publishing Tracker page on the Source.
Dav Nodes
Aug. 21, 2019
The new DAV nodes accessible through eagle-dav.hpc.nrel.gov now have NVIDIA GV100 cards installed. These NVIDIA cards allow visualization functions that were previously not available. The NVIDIA GV100 provides state of the art AI-enhanced design and visualization capabilities with extreme memory capacity, scalability, and performance that research science can use to create, build, and solve difficult and involved graphics problems.
Please contact HPC User Operations if you need any assistance or have questions in reference to the new functionality.
Node Use Efficiency
Aug. 21, 2019
When building batch scripts it is advisable to first become familiar with the capabilities offered by the Eagle nodes. In creating your batch scripts, please keep in mind the memory capacities of the nodes, the type of cores available and to be aware that running multiple tasks on each node or the use of job arrays may assist in using your node hours more effectively. Further, assign the memory requirement for proper node type and process management based on the capability of differing nodes.
Some Slurm options that you might consider are:
Share