Skip to main content

Announcements

Read announcements for NREL’s high-performance computing (HPC) system users. 

Jupyterhub Documentation

Feb. 1, 2021

We have written up a how-to guide for using the Europa Jupyterhub server, including setting up custom Python, Julia, and R kernels and interacting with Eagle. See https://www.nrel.gov/hpc/jupyterhub.html for more. Continue reading

ESIF-HPC-3 Project Update

Feb. 1, 2021

We have pushed the proposal deadline out to February 18th and expect to have reviews completed sometime in mid-April. We have been fielding questions from various interested offerors, and are looking forward to seeing what they've got!

For those interested in the ESIF-HPC-3 Request for Proposals content can be found on beta.SAM.gov.

Continue reading

Application Updates

Feb. 1, 2021

  • Q-Chem has been upgraded to version 5.3.2. See changes here.
  • Star-CCM version 15.06.008 is available on Eagle.
  • ARM Forge version 20.2 is available on Eagle.
  • We are working on acquiring a Maintenance license for VASP 6. Once we have this in place, users will need to have an upgraded VASP 6 Research workgroup license in order to use our VASP 6 builds on Eagle.
Continue reading

HPC Office Hours

Feb. 1, 2021

HPC office hours are being held remotely via Teams on a bi-weekly basis alternating between Tuesday and Thursday at 11am. Please see the NREL HPC Training page: NREL HPC Training page. Continue reading

Lustre Quotas

Nov. 3, 2020

Effective with the new Fiscal Year 2021 Project allocations for Eagle, quotas for approved storage allocations' capacities have been implemented on /projects and MSS on Eagle. This was to encourage users to manage their /projects data usage and usage of /scratch for jobs. HPC Operations is developing reporting capabilities of usage, but in the mean time, users may request help from the HPC Help Desk, or utilize these procedures from an Eagle Login node: Continue reading

Changes to Eagle Mass Storage System (MSS)

Nov. 3, 2020

What: NREL HPC Operations is in the process of retiring the on-premise MSS capability and has started using cloud-based data storage capability. Continue reading

Eagle Year In Review

Oct. 9, 2020

We hope and expect that FY20 was anomalous.  Most years HPC systems at NREL deliver more than 95% availability and many fewer interruptions.  We have fixed hardware issues, patched and upgraded OS and software, and increased focus on communications.  We do our very best to decrease the number and duration of both planned and unplanned outages that do occur... Continue reading

Enforcing Quotas in /projects

Oct. 9, 2020

Effective for FY21 allocated projects, quotas will be implemented in Eagle's /projects to match the approved storage allocations.  Users are strongly encouraged to utilize /scratch for their jobs, and move results to /projects. Continue reading

Queue Wait Times

June 5, 2020

At the start of April, the 'queue depth' - the time it would take Eagle to complete all the jobs in the queue based on wall time limits - increased significantly from the historically common < 4 days to over 3 weeks. This isn't unprecedented, but it has persisted longer than previously and we realize this creates a challenge for users. Eagle is making progress and has reduced the backlog to a bit under 2 weeks.  In light of the sustained high volume of work we have taken the following actions:

  • Tuned scheduler performance based on longer wait times. Previously it was optimized for up to 5 day wait time
  • Decreased ability of standby jobs to accumulate priority. Previously they would accumulate priority, albeit slowly, and eventually could run if they waited long enough. This should no longer happen if there are any 'non-standby' jobs in the queue. Standby jobs may still run, using Slurm backfill scheduling, if they do not impact the start time of any priority jobs.
  • Tentatively delayed development of a new system image to next quarter to reduce scheduled downtime this quarter
  • Implemented the published allocation reduction policy to bring remaining allocations into alignment with available hour
  • Improved introspection of log data to monitor throughput and queue performance to help with ongoing efforts to adapt to changing workloads. We plan to make this data visible through a web based interface in the near future - look for an announcement in the upcoming weeks. 

The scheduling algorithm is sophisticated and designed to maximize overall productivity of the machine, which makes it difficult sometimes to see why particular jobs start before others. The factors that go into calculating job priority are described on the following pages:

https://www.nrel.gov/hpc/eagle-job-partitions-scheduling.html

https://www.nrel.gov/hpc/eagle-job-priorities.html

Continue reading