Petaflop Seismic Simulations in the Public Cloud

Alexander N. Breuer, Yifeng Cui, & Alexander Heinecke

Accepted June 19, 2019, SCEC Contribution #9083

During the last decade cloud services and infrastructure as a service became a popular solution for diverse applications. Addition- ally, hardware support for virtualization closed performance gaps, compared to on-premises, bare-metal systems. This development is driven by offloaded hypervisors and full CPU virtualization. Today’s cloud service providers, such as Amazon or Google, offer the ability to assemble application-tailored clusters to maximize performance. However, from an interconnect point of view, one has to tackle a 4-5× slow-down in terms of bandwidth and 25× in terms of latency, compared to latest high-speed and low-latency interconnects. Taking into account the high per-node and accelerator-driven performance of latest supercomputers, we observe that the network bandwidth performance of recent cloud offerings is within 2× of large supercomputers. In order to address these challenges, we present a comprehensive application-centric approach for high-order seismic simulations utilizing the ADER discontinuous Galerkin finite el- ement method, which exhibits excellent communication characteristics. This covers the tuning of the operating system, normally not possible on supercomputers, micro-benchmarking, and finally, the efficient execu- tion of our solver in the public cloud. Due to this performance-oriented end-to-end workflow, we were able to achieve 1.09 PFLOPS on 768 AWS c5.18xlarge instances, offering 27,648 cores with 5 PFLOPS of theoretical computational power. This correlates to an achieved peak efficiency of over 20% and a close-to 90% parallel efficiency in a weak scaling setup. In terms of strong scalability, we were able to strong-scale a science sce- nario from 2 to 64 instances with 60% parallel efficiency. This work is, to the best of our knowledge, the first of its kind at such a large scale.

Citation
Breuer, A. N., Cui, Y., & Heinecke, A. (2019, 06). Petaflop Seismic Simulations in the Public Cloud. Oral Presentation at ISC High Performance 2019. http://dial3343.org/pub/papers/19_03_26_isc_19.pdf