SCEC Award Number 24133 View PDF
Proposal Category Individual Research Project (Single Investigator / Institution)
Proposal Title Statewide California Earthquake Dataset for Machine Learning
Investigator(s)
Name Organization
Weiqiang Zhu University of California, Berkeley
SCEC Milestones A1-2 SCEC Groups Seismology, RC
Report Due Date 03/15/2025 Date Report Submitted 02/17/2025
Project Abstract
This project develops a comprehensive, standardized database to advance deep learning applications in seismology across the San Andreas Fault System (SAFS). This project addressed the need for a unified dataset combining both Northern and Southern California seismic data, essential for developing next-generation deep learning models. The project compiled approximately 4 million seismic waveforms by integrating records from the Northern California Earthquake Data Center (NCEDC) and Southern California Seismic Network (SCSN). The new dataset structure incorporates complete catalog and phase information, including event metadata, station details, and waveform data, supporting the development of more advanced deep learning models. The dataset's organization by year and event ID ensures easy maintenance and updates, establishing a foundation for future automated seismic analysis systems. This comprehensive dataset would help improve earthquake monitoring capabilities across the SAFS, advancing SCEC's mission of enhancing seismic hazard assessment and understanding.
Intellectual Merit This project establishes an unified seismic dataset spanning the entire San Andreas Fault System, enabling systematic studies of fault behavior across regional boundaries. The dataset enables development of more sophisticated deep learning models for earthquake detection, phase picking, and focal mechanism determination, creating new opportunities for understanding fault system dynamics and improving earthquake monitoring capabilities.
Broader Impacts This project enhances the infrastructure for seismological research by providing an open, standardized dataset that democratizes access to California's seismic data for researchers, students, and institutions worldwide.
The dataset will accelerate the development of machine-learning-based earthquake monitoring systems, ultimately contributing to more effective earthquake hazard assessment across California.
Project Participants Berkeley Seismological Laboratory, UC Berkeley:
Weiqiang Zhu (PI)
Haoyu Wang
Bo Rong
Stephane Zuzlewski
Taka'aki Taira
Julien Marty
Richard M Allen

Caltech Seismological Laboratory, Caltech:
Ellen Yu
Gabrielle Tepp
Allen Husker
Exemplary Figure Figure 6: Key characteristics of seismic events and waveforms in the CEED dataset: (a) distribution of epicentral distances, (b) distribution of event depths, (c) distribution of signal-to-noise ratios (SNR), (d) distribution of frequency indices, and (e) distribution of back azimuths. These distributions highlight the dataset's coverage of diverse seismic recording conditions.
Linked Publications

Add missing publication or edit citation shown. Enter the SCEC project ID to link publication.