Why am I being "taxed" 3× to use Eagle instead of Peregrine
June 04, 2019
Eagle represents the state-of-the-art architecture as of 2018, whereas Peregrine does the same for ~2013. All aspects of computational hardware have advanced substantially in the interim, so we'll look at a few below, and how those might impact the amount of work you can get done given an hour on an Eagle node, vs. an hour on a Peregrine node.
- Core count: Peregrine's densest nodes from a core-count perspective had 24 cores. All Eagle nodes have 36 cores, representing a 50% increase over Peregrine per node. Codes that exploit shared-memory parallelism, distributed-memory parallelism that fit onto 1 node, and workflows that efficiently exploit these cores can do 150% of the work that can be done on Peregrine from this factor alone.
- Clock speed: Peregrine's Haswell nodes run at 2.3 GHz, whereas the Skylake processors on Eagle nominally run at 3.0 GHz. Thus, in any given time increment one expects to do 3.0/2.3 = 1.3 times as much work on Eagle as on Peregrine per core.
- Vector widths: Skylake processors host 512-bit wide AVX-512 registers and floating-point vector units, twice the width of Haswell's 256-bit wide AVX2 registers. This can in principle double throughput for amenable code sections.
- Memory: Eagle uses faster 2666 MHz memory, whereas Peregrine Haswell nodes use 2133 MHz, a 25% speed boost. Skylake also supports 6 memory channels per socket vs. 4 for Haswell, providing another 50% boost on this basis.
- Ultra Path Interconnect vs. QuickPath Interconnect: 10.4 GT/s vs. 6.4 GT/s between sockets.
Infiniband EDR vs. FDR: Eagle's EDR Infiniband can show substantial performance increases over Peregrine's FDR (e.g., https://insidehpc.com/2016/02/fdr-and-edr-infiniband). Nominally, 4-lane EDR supports 12 GB/s bandwidth vs. 7 GB/s for FDR, a nominal 1.7-fold increase. Source: https://www.infinibandta.org/infiniband-roadmap-charting-speeds-for-future-needs/
- PFS bandwidth and latency: Eagle's peak roughly 250 GB/s bandwidth to the Lustre filesystem is over 6× Peregrine's. In addition, the SSD metadata drives on Eagle's parallel filesystem should accelerate IOps-intensive workloads compared with Peregrine.
Working out computational bottlenecks is not straightforward, so a single factor can not predict what numerical advantage one might see from Eagle. However, from both a difference in High Performance Linpack results and by using our internal throughput benchmarks comprised of more realistic and balanced workloads, Eagle offers a 3.5-fold improvement over Peregrine in time-to-solution.
If you are not seeing that in your own work, there is a good chance that either your workflows or codes could benefit from consultation. Don't hesitate to contact HPC-Help@nrel.gov; we can at least provide an informed assessment of where time might best be spent in extracting performance from Eagle that will make the 3-fold higher allocation burn rate seem like a deal!