Group B, Poster #112, Earthquake Geology

Automating Earthquake Field Data Parsing with Machine Learning: From Free-Text to Structured Observations

Neeraja Vasa, Harini L. Pootheri, Edric Pauk, Tran T. Huynh, Luke Blair, Kate Thomas, & Timothy Dawson
Poster Image: 

Poster Presentation

2025 SCEC Annual Meeting, Poster #112, SCEC Contribution #14487 VIEW PDF
Spatial data collected from the field after earthquakes is heterogeneous and requires extensive manual post-processing before publication. The field observation dataset from the 2014 Napa earthquake took five years to publish due to paper-based collection, while data from the 2019 Ridgecrest earthquake took one year using form-based mobile apps. However, significant amounts of data were still received in non-standardized formats, creating opportunities for automated parsing to further reduce postprocessing times.

Parsing and standardizing observation data involves manually interpreting various terminologies, unit conversions, and free-text field notes across multiple input forma...
ts. Scientists at the USGS and CGS undertook this manual process for the fault rupture observation datasets from Napa and Ridgecrest earthquakes. Using these datasets, we trained a machine learning (ML) model to parse and extract data from free-text fields, and classify it into structured fields.

Our model achieved an average accuracy of 88% in extracting structured data from free-text notes for the Napa and Ridgecrest datasets combined. This approach demonstrates potential to reduce earthquake field data processing from years to months. Future work will expand beyond fault rupture free-text parsing to handle other hazard types and additional data formats.

SHOW MORE