AI-ready, multi-modal dataset of offset landforms along the Carrizo segment of the San Andreas fault

Cassandra Brigham, Chelsea P. Scott, Ramon Arrowsmith, & Samuel Johnstone

Submitted September 7, 2025, SCEC Contribution #14940, 2025 SCEC Annual Meeting Poster #TBD

Accurately mapped landforms and reconstructed offsets are critical for understanding fault-slip history and earthquake recurrence along strike-slip faults. Analysis of offset landforms from high-resolution topography and satellite imagery broadens coverage and reveals variability in slip accumulation. Manual mapping and offset calculation provide valuable constraints but are limited in number. Uncertainty arises from epistemic and aleatoric sources; measuring all plausible offsets accounts for variable interpretations but is labor-intensive, motivating automated methods that complement expert interpretation. Foundation models such as Meta AI’s Segment Anything Model 2 (SAM2) show promising segmentation but require fine-tuning.

We present an AI-ready, multimodal training dataset of offset landforms along the Carrizo segment of the San Andreas Fault, central California, to support semantic segmentation, preceding offset reconstruction. Our workflow (1) identifies fault traces, thalwegs, channel margins, and alluvial fans using SAM2-powered segmentation and (2) reconstructs offsets by correlating and measuring displaced elements using LaDiCoaz_v2. This work provides the training dataset to fine-tune SAM2, enabling segmentation and offset reconstruction at scale.

Our dataset comprises 277 sites from established compilations. For each site, we aggregate lidar (≤1 m/pixel) and Planet satellite RGB-NIR data (≤5 m/pixel), and compute derivatives that emphasize geomorphic expression (slope, curvature, aspect, and vegetation/wetness indices). A single expert creates pixel-level annotations for the geomorphic classes to reduce variability, using an open-source annotation platform.

Preliminary experiments show that untuned SAM2 often delineates channel margins and alluvial fans; we use this to accelerate labeling and to establish baseline benchmarks across lidar-only, satellite-only, and fused inputs. These baselines demonstrate the utility of the training set and outline a path for parameter-efficient fine-tuning tailored to tectonic geomorphology.

We package the dataset with Croissant (open-source ML dataset standard), including machine-readable schema, geospatial metadata, and provenance, aligned with FAIR and Responsible AI. This resource links expert mapping with scalable automation and provides a foundation for fine-tuned, multimodal segmentation models that can ultimately support a large number of offset reconstructions.

Key Words
strike-slip fault, offset, training dataset, Segment Anything Model, semantic segmentation

Citation
Brigham, C., Scott, C. P., Arrowsmith, R., & Johnstone, S. (2025, 09). AI-ready, multi-modal dataset of offset landforms along the Carrizo segment of the San Andreas fault. Poster Presentation at 2025 SCEC Annual Meeting.


Related Projects & Working Groups
Earthquake Geology