Fault lines and field notes: Machine learning-driven parsing of post-earthquake field data into structured observations

Harini L. Pootheri, Neeraja Vasa, Edric Pauk, Tran T. Huynh, Luke Blair, Kate Thomas, & Timothy Dawson

Submitted September 7, 2025, SCEC Contribution #14507, 2025 SCEC Annual Meeting Poster #TBD

After an earthquake, field teams are deployed to collect perishable data at locations with visible geological effects. Earthquake field data often contains varying schemas, mixed formats, and unstructured notes, making it difficult to convert into a standardized structure due to inconsistencies across datasets. Field data in its raw form needs to be cleaned, interpreted, and mapped into a consistent structure to serve as a reliable data source for scientific analysis. Currently this process is done through manual and meticulous data post-processing, which can be time consuming and resource extensive. While geologists originally mapped free-text notes to appropriate fields manually, our approach automates this process using machine learning. Utilizing datasets from both the 2014 Napa and 2019 Ridgecrest earthquakes containing observations recorded by geologists, we first mapped the field names present in the Napa and Ridgecrest schemas to the most recent “Current” schema based on their meaning in the geological context. The Current schema is the latest field structure developed by the United States Geological Survey (USGS) and California Geological Survey (CGS) to unify earthquake field data across events and enable consistent data analysis. Because the Napa and Ridgecrest schemas were developed independently in 2014 and 2019, discrepancies are present between them, which our work aims to standardize through schema mapping. Next, we wrote a Python script to execute the schema mapping into the Current schema, trained a multi-label machine learning model to classify free-text notes, and validated the model’s predictions against the Napa and Ridgecrest datasets. The goal of our machine learning model is to automate the process of parsing and extracting raw spatial data in the form of free-text notes into schema-conforming data that can be shared as preliminary datasets shortly after an earthquake event.

Key Words
machine learning, earthquake rupture field data, 2014 Napa earthquake, 2019 Ridgecrest earthquake sequence

Citation
Pootheri, H. L., Vasa, N., Pauk, E., Huynh, T. T., Blair, L., Thomas, K., & Dawson, T. (2025, 09). Fault lines and field notes: Machine learning-driven parsing of post-earthquake field data into structured observations. Poster Presentation at 2025 SCEC Annual Meeting.


Related Projects & Working Groups
Earthquake Geology