Skip to main content

Why am I being "taxed" 3× to use Eagle instead of Peregrine

June 04, 2019

Eagle represents the state-of-the-art architecture as of 2018, whereas Peregrine does the same for ~2013. All aspects of computational hardware have advanced substantially in the interim, so we'll look at a few below, and how those might impact the amount of work you can get done given an hour on an Eagle node, vs. an hour on a Peregrine node.

  • Core count: Peregrine's densest nodes from a core-count perspective had 24 cores. All Eagle nodes have 36 cores, representing a 50% increase over Peregrine per node. Codes that exploit shared-memory parallelism, distributed-memory parallelism that fit onto 1 node, and workflows that efficiently exploit these cores can do 150% of the work that can be done on Peregrine from this factor alone.
  • Clock speed: Peregrine's Haswell nodes run at 2.3 GHz, whereas the Skylake processors on Eagle nominally run at 3.0 GHz. Thus, in any given time increment one expects to do 3.0/2.3 = 1.3 times as much work on Eagle as on Peregrine per core.
  • Vector widths: Skylake processors host 512-bit wide AVX-512 registers and floating-point vector units, twice the width of Haswell's 256-bit wide AVX2 registers. This can in principle double throughput for amenable code sections.
  • Memory: Eagle uses faster 2666 MHz memory, whereas Peregrine Haswell nodes use 2133 MHz, a 25% speed boost. Skylake also supports 6 memory channels per socket vs. 4 for Haswell, providing another 50% boost on this basis.
  • Ultra Path Interconnect vs. QuickPath Interconnect: 10.4 GT/s vs. 6.4 GT/s between sockets.
    Infiniband EDR vs. FDR: Eagle's EDR Infiniband can show substantial performance increases over Peregrine's FDR (e.g., https://insidehpc.com/2016/02/fdr-and-edr-infiniband). Nominally, 4-lane EDR supports 12 GB/s bandwidth vs. 7 GB/s for FDR, a nominal 1.7-fold increase. Source: https://www.infinibandta.org/infiniband-roadmap-charting-speeds-for-future-needs/
  • PFS bandwidth and latency: Eagle's peak roughly 250 GB/s bandwidth to the Lustre filesystem is over 6× Peregrine's. In addition, the SSD metadata drives on Eagle's parallel filesystem should accelerate IOps-intensive workloads compared with Peregrine.

Working out computational bottlenecks is not straightforward, so a single factor can not predict what numerical advantage one might see from Eagle. However, from both a difference in High Performance Linpack results and by using our internal throughput benchmarks comprised of more realistic and balanced workloads, Eagle offers a 3.5-fold improvement over Peregrine in time-to-solution.

If you are not seeing that in your own work, there is a good chance that either your workflows or codes could benefit from consultation. Don't hesitate to contact HPC-Help@nrel.gov; we can at least provide an informed assessment of where time might best be spent in extracting performance from Eagle that will make the 3-fold higher allocation burn rate seem like a deal!