Manage I/O Task in a Normalized Cross-Correlation Earthquake Detection Code for Large Seismic Datasets

Dawei Mu, Pietro Cicotti, & Yifeng Cui

Published August 15, 2017, SCEC Contribution #7711, 2017 SCEC Annual Meeting Poster #276

We have developed a high-performance GPU-based software called “cuda Normalized Cross-Correlation” (cuNCC), for calculating seismic waveform similarity for subjects like hypocenter estimates and small earthquake detection. We present the performance and I/O optimizations applied in the cuNCC code.

Our GPU-based template matching algorithm is designed to make full use of fast on-board/on-chip cache of modern GPU architecture, which includes register, constant memory, and shared memory etc. An application involving many templates, our algorithm achieves high efficiency due to introducing a new data-reuse feature in the algorithmic design. cuNCC records 2912 Gflop/s on a single Pascal P100 GPU, a speedup of more than 1,600x in comparison to a common sequential CPU code.

I/O efficiency became a significant bottleneck of the cuNCC’s overall performance. The I/O benchmarking results demonstrated that using the shared memory virtual filesystem as a buffer to output the cuNCC result obtained the best I/O efficiency, especially when the similarity coefficients are the median result for the following computation. When the shared memory virtual filesystem is unavailable, we recommend using CPU memory as a buffer to reduce disk access frequency for low bandwidth I/O device. As for high bandwidth I/O device, we suggest directly output results to storage without the buffering scheme.

We performed a realistic production run to evaluate the cuNCC code, using a total number of 21,325 template waveforms with 256 samples each. The seismogram dataset consists of all continuous recordings from the 43 stations within 4 weeks. The entire TMA process involves over 4 trillion NCC calculations. Our GPU-based cuNCC took 26 minutes on the Pascal P100, an optimized parallel CPU code in comparison would take 21 hours on 18-cores Xeon E7-8867. As a science application case, the number of aftershocks detected using the new TMA code is more than 4 times the number of aftershocks cataloged by the Central Weather Bureau in Taiwan.

Key Words
earthquake detection, I/O, CUDA

Mu, D., Cicotti, P., & Cui, Y. (2017, 08). Manage I/O Task in a Normalized Cross-Correlation Earthquake Detection Code for Large Seismic Datasets. Poster Presentation at 2017 SCEC Annual Meeting.

Related Projects & Working Groups
Computational Science (CS)