Cloud-Native Analysis of Southern California Waveform Data

Tim Clements, Julian F. Schmitt, & Marine A. Denolle

Published August 14, 2020, SCEC Contribution #10478, 2020 SCEC Annual Meeting Poster #193 (PDF)

Poster Image: 
Cloud computing offers a paradigm shift for data-heavy seismic analysis by bringing computation to data rather than data to computation. Southern California is the ideal testing ground for cloud-native analyses, as the entire archive of waveform data from the Southern California Earthquake Data Center going back to 2000 (~100TB) is now hosted as a freely-hosted Open Data Set on Amazon Web Services. Data is hosted in day-long miniSEED files in the “scedc-pds” Simple Storage Service (S3) bucket in the us-west-2 region.

Here, we present insights from processing data from Southern California Earthquake Data Center on AWS. We highlight using a mix of AWS services: including S3 for storage, Elastic Compute Cloud (EC2) for compute, Athena for serverless data queries, and Elastic Kubernetes Service (EKS) for autoscaling containerized compute. Using these services, we achieve download (transfer from S3 storage to EC2) speeds of up to 2 GB/s per EC2 instance. We introduce the SCEDC.jl API for interacting with SCEDC data on AWS in Julia. We present initial observations from cross-correlating data from the CI network using SeisNoise.jl, a Julia package for ambient noise cross-correlation.

Key Words
cloud, AWS, cross-correlation, data,

Citation
Clements, T., Schmitt, J. F., & Denolle, M. A. (2020, 08). Cloud-Native Analysis of Southern California Waveform Data . Poster Presentation at 2020 SCEC Annual Meeting.


Related Projects & Working Groups
Computational Science (CS)