SCEC Project Details
SCEC Award Number | 19176 | View PDF | |||||||
Proposal Category | Individual Proposal (Integration and Theory) | ||||||||
Proposal Title | Unified and Continuous Software Development for AWP-ODC-OS: Phase II | ||||||||
Investigator(s) |
|
||||||||
Other Participants | 1 graduate student TBD | ||||||||
SCEC Priorities | 4c, 4a, 2b | SCEC Groups | CS, GM, EFP | ||||||
Report Due Date | 04/30/2020 | Date Report Submitted | 09/22/2022 |
Project Abstract |
In this project, we’re transferring our initial continuous software development infrastructure for AWP-ODC-OS to a service as software stack. While previous SCEC project focuses on improving the code quality of AWP-ODC-OS itself, we shift gears in this work. Targets are extensibility and scalability of the respective continuous integration and delivery processes. We exercise the new service on SDSC’s Expanse, which features direct scheduler-integration with AWS, leveraging high-speed networks to ease data movement to/from the cloud. We also demonstrate a first-hand Cloud experience with EDGE in AWS, with sustained 1.09 Pflop/s achieved in weak scaling on 768 instances. |
Intellectual Merit | The continuous integration and delivery structure for AWP-ODC-OS through this project demonstrates the layout of the respective repositories and provides insights on the technical aspects of the important software and data implementation. The CI/CD pipelines implemented automatically analyze AWP-ODC-OS from simple sanity checks, e.g., memory debugging, towards a continuous verification on supercomputing infrastructure. This transformation towards modern and comprehensive software and engineering practices starts with AWP-ODC-OS, tailored to SCEC’s community open source needs in the long run, with the potential to extend the CI/CD server setups into the public clouds, e.g., Amazon's AWS EC2. |
Broader Impacts | This work helps translate basic research into practical products for reducing risk and improving community resilience. Cloud solutions are the best-available approach to tackle this amount of diversity, and continuous development provides improved capability of wave propagation codes, towards automated continuous verification on supercomputing infrastructure. |
Exemplary Figure | Figure 2 Weak and strong scalability of EDGE in AWS EC2 on c5.18xlarge instances. We sustained 1.09 Pflops in weak scaling on 768 instances. This elastic high performance cluster contained 27,648 Skylake-SP cores with a peak performance of 5 Pflop/s. The strong scaling setting on 64 instances had a performance of 53 Tflop/s. This work was published in a technical paper at ISC’2019. |