Group A, Poster #233, Ground Motions (GM)
The digital archivist: Automating legacy macroseismic data processing using large language models
Poster Image:

Poster Presentation
2025 SCEC Annual Meeting, Poster #233, SCEC Contribution #14630 VIEW PDF
lues, and descriptions from each report. For reports without MMI values, Gemini inferred an intensity from damage descriptions. To address coordinate precision limits, addresses were geocoded via Google’s Geocoding API. This workflow yielded over 2,500 geocoded intensity reports for the Daly City earthquake. To assess the accuracy of the workflow, we first evaluated the full dataset by generating a ShakeMap and comparing interpolated intensities with instrumental data from five strong-motion stations in the San Francisco Bay Area. Using ground motion to intensity conversion equations, we converted known peak ground velocity and acceleration values to MMI. The dataset closely matched instrumental values with a mean absolute error of ~0.5. When Gemini used 0.5 increments instead of whole numbers, the error decreased to ~0.35, demonstrating the importance of prompt engineering for optimizing results. We next separated the dataset into two subsets: reports where MMIs were explicitly provided (“extracted”), and reports where Gemini assigned MMIs from descriptions (“AI-generated”). Using the extracted subset as ground truth, we conducted a pixel-by-pixel comparison between the interpolated ShakeMaps from the two datasets and found a mean absolute error within 0.5, indicating Gemini’s inferred ratings aligned well with observed ones. We also present preliminary results for the 1971 Sylmar earthquake. Our results demonstrate LLMs’ potential for reliably extracting and analyzing large, unstructured macroseismic datasets. LLMs could offer a scalable solution for rapidly digitizing macroseismic archives, enabling broader use in modern seismic hazard analysis to constrain ground motion models and improve understanding of site effects in urban areas.
SHOW MORE
SHOW MORE