System Transition from Peregrine to Eagle
Learn about migrating your workflow from Peregrine to the latest high-performance computing (HPC) system—Eagle.
In an effort to address the feedback given by the HPC user community, HPC operations has acquired a state-of-the-art system. The new system, Eagle, was configured with features that better accommodate the needs expressed by our users. Eagle also comes with some improvements to the operating environment that we believe will further streamline the HPC user experience compared to previous systems. Please see the sections below for guidance with getting situated on Eagle.
The HPC Operations team held workshops for providing live assistance with acclimating to Eagle, and is developing similar sessions to help users get the most out of HPC resources. The resources used during these presentations are available here:
Please see our schedule of upcoming workshops if you are interested in attending one of our sessions hosted directly by members of HPC Operations who are familiar with the nuances of how Eagle's software environment operates.
Users who had access to Peregrine will be able to access Eagle using the same username and password.
Internal users connected to the NREL network either onsite with an NREL device or offsite via VPN can access Eagle using these domain names:
|Login Node||DAV Node|
External users connecting directly with the SSH protocol and a One-Time Password token may use these domain names without connecting to the NREL HPC VPN:
|Login Node||DAV Node|
Sharing RSA Public Keys Between Systems for Ease of Access
First and foremost, please do not run ssh-keygen while logged into HPC systems. SSH keys are generated for your account automatically the first time you login. This mechanism only detects the absence of your RSA keys, not any modification. If you generate new keys, the modified keys will not be propagated, and you will not be able to login to compute nodes in any of your jobs.
You can access Eagle from Peregrine and vice versa. This is often done to transfer files with commands such as scp or rsync. You will likely find having to type in your password each time you access one system from the other to be tedious at best, so this section will demonstrate how to safely and effectively copy your RSA public key between the two systems so you don't have to provide your password each time you swap systems.
First, login to Peregrine via SSH. From your terminal, type:
[<username>@login4] $ ssh-copy-id <username>@el.hpc.nrel.gov
Where <username> should be replaced with your HPC account username. You will be prompted for your password directly after executing this command. If done successfully, this will emplace the public key that was generated on Peregrine as an authorized key on Eagle, so you won't have to provide your password anymore.
This command will prompt you to login to el.hpc.nrel.gov to make sure the key was copied successfully, which you should do as part of the next step.
Once logged into Eagle, run the following to complete two-way access:
[<username>@el4] $ ssh-copy-id <username>@peregrine.hpc.nrel.gov
You should now be able to easily login to one system from the other. Be mindful of too many nested shells, as this may create other problems.
Allocation Management and Accounting
Prior to Eagle, allocation usage statistics could be seen by running the alloc_tracker command while logged into Peregrine. alloc_tracker has been deprecated because it does not have sophisticated enough logic to track usage across both Peregrine and Eagle, and the successor script hours_report should be used instead. Please run hours_report --help to see detailed usage information and advanced querying options.
As with Peregrine, allocations which exhaust their allotment of NREL Hours may still submit jobs but they will be submitted with minimum priority.
Eagle uses Slurm for job scheduling, whereas Peregrine used PBS. The workflow for submitting jobs is identical, however the exact job-submission command syntax differs greatly from Peregrine.
Below are a few conversions between the most common job scheduling commands:
|PBS Command||Analogous Slurm Command|
|qsub <job-executable>||sbatch <job-executable>|
|qdel <job_id>||scancel <job_id>|
For a comprehensive guide on using Slurm, see running jobs on the Eagle system.
Filesystems and Data Transfer
Eagle was provided with new storage hardware, which features larger capacity and faster access. Consequently, Eagle will not share a filesystem with Peregrine. This means that users will need to copy or move any relevant data to the new filesystems. We recommend that users desiring to transfer data see our short overview on data storage and transfer, which discusses the best tools to use respectively for different magnitudes of file size.
Please note: Globus is the de facto method we are encouraging for transferring data from Peregrine to Eagle. To get setup and start transferring your data more quickly than conventional methods, see our documentation on Globus
Notable filesystem mounts on Eagle differ slightly from those on Peregrine. Below are the various mountpoints and their intended use:
Each user has a personal directory under /home with a quota of 50 gigabytes. This is where shell startup files, scripts, source code, executables and other small files should reside.
The /nopt directory on Eagle is where NREL-specific software, module files, and licensed software is kept.
Each user has a directory under scratch. This is where data that may be mutually accessed by several nodes at once should reside. Scratch has parallel network transfer protocols and features much higher bandwidth than the normal NFS mounts listed above. Files in /scratch that have not been accessed in 30 days will be deleted.
Each Eagle node has a "local scratch", the full path of which will be set under the $LOCAL_SCRATCH environment variable during a job and will be /tmp/scratch across all Eagle nodes. A node will not have read or write access to any other node's local scratch, only its own. These directories are for performant per-node file manipulation.
Each project allocation has a directory in /projects to serve as a mutually accessible repository for all members of that project.
Much like a directory under /projects allows member of a project to mutually access files, a directory in /shared-projects allows mutual access from members of several projects. One can be requested from the HPC Operations team. Projects which splintered into several allocations due to recent changes in allocation policies may benefit from this by allowing access to common data from the child project allocations.
The /datasets directory on Eagle hosts widely-used datasets that are accessible across project allocations.