Skip to main content

Announcements

Read announcements for NREL’s high-performance computing (HPC) system users. 

Stability of Large Jobs

June 5, 2020

In early May HP and NREL formed a joint technical team along Mellanox to look into errors impacting large job productivity on Eagle that showed up intermittently under heavy network loads. Last Friday the team identified the root cause and is working on a mitigation strategy along with an eventual long term solution. Specifically, an issue has been identified with the version of OFED the system was delivered with. OFED works to tie the Linux kernel, the InfiniBand adaptors, the fabric and the MPI software together: we have identified a version of OFED and associated firmware to upgrade to in order to more reliably run large jobs.  In order to deploy these fixes we will be need to create a new image to deploy on Eagle. This will take a full system outage to deploy, and we are sensitive to the large backload of work in the Eagle queue. In the meantime it appears that jobs on adjacent nodes experience reduced impacts so topology aware placement may help. Continue reading

Upgrade Image on Eagle

June 5, 2020

Eagle has been running on CentOS 7.4 image since its deployment in late 2018. The ACO team has been working on upgrading the image to CentOS 7.7 and is anticipated to be rolled out this July. 

The new image will be running the upgraded kernel, lustre client, OFED drivers and Infiniband HCA firmware. This update will address the requests for newer kernel, issues related to large job productivity,  vulnerabilities of the current kernel and to align with the new recipe of software/firmware stack provided by HPE and Mellanox.

FastX running on the user facing Data Analysis and Visualization (DAV) nodes will be upgraded to the newer release to address the current issues with load balancing.

Continue reading

Cybersecurity on Eagle

June 5, 2020

The Advanced Computing Operations team has been working on a DOE mandated cyber security assessment and mitigation effort -tied to a PEMP goal- that is due to be completed mid June; reduction of patchable security vulnerability continues. The last day of reportable updates will be June 12th, after which final reports will be generated and all data will be compiled and information sent to DOE. Continue reading

Eagle Expansion nodes for AMO and WETO

June 5, 2020

The Advanced Computing Operations team is working with Hewlett Packard Enterprise to install 432 additional nodes and 2 petabytes of storage on Eagle this week purchased bye AMO and WETO. The installation team includes about 10 people who are working to place, plumb, fill (with water), connect, configure, and test the new equipment.  Installation activities started at 08:00 Monday June 1 and are expected to be complete by COB on Friday June 5. Following successful installation the new hardware will be exercised and tested, then released to production next week.

Continue reading

Advanced Computing Operations

May 11, 2020

HPC Systems and Operations is now Advanced Computing Operations. The ACO team supports operation of the ESIF HPC User Facility and NREL projects using Amazon Web Services (AWS) Cloud. Continue reading

Intended Use of /projects and /scratch

May 11, 2020

/projects and /scratch are shared resource for Eagle. We encourage users to review the published Shared Storage Usage Policy.

/projects is intended to be used by approved Eagle allocated projects to contain only critical information and programs necessary for the project to succeed, up to the capacity approved by the allocation request in Lex (https://lex.hpc.nrel.gov/projects/<Lex project number>/award/).  It is recommended that critical information in /projects be regularly copied to the Mass Storage System (MSS). We anticipate that quotas matching approved allocations for /projects will be implemented in the near future.

To see a project's usage on Eaglefs, you can run the following command substituting your project name for csc000:

Continue reading

VASP fix for implicit solvation package

May 11, 2020

The HPC installations of VASP 5.4.4 (accessible from the vasp/5.4.4 module) will be updated on May 8th to address an issue with the VASPsol implicit solvation model. As part of this update, the software dependencies will be updated to Intel MPI and MKL 2019 and GCC 8.2.0. These changes should be transparent, and will not require changes to your workflow.

The existing binaries will continue to be accessible with a file extension of “1”, e.g., vasp_gam.1

Continue reading

Appropriate Use of /scratch and /projects

April 24, 2020

NREL HPC allocations include storage space in the /projects filesystems. This is a shared resource, and project leads should periodically monitor usage. Currently there is a lot of metadata use on /scratch and /projects. There are a lot of empty directories in /scratch and /projects using space and metadata space in particular, and we could use your help cleaning those up. Please contact hpc-help@nrel.gov if you need help assessing your data or deleting empty directories. Please visit our website to review the shared storage usage policy. Continue reading

Don't Hold Software Licenses!

April 24, 2020

Network floating licenses are a shared resource. Whenever you open an application window, a license is pulled from the pool and becomes unavailable to other Eagle users. Please do NOT keep idle windows open—if you are not actively using the application, close it and return the associated licenses to the pool.

Continue reading

Allocation Reduction Policy

April 24, 2020

Allocation reduction policy will be applied April 10th. As a reminder, quarterly reductions to the annual allocation, in FY 2020, shortly after Quarter 1, Quarter 2, and Quarter 3 end, allocations will be automatically adjusted to account for low utilization against planned usage. Instead of "use or lose", there will be some percentage of the initial allocation amount removed from the remaining balance. Changes to the spend plan should be communicated to us as early as possible. For more information, please go to our website page Allocation Reductions. Continue reading