Group B, Poster #200, Earthquake Forecasting and Predictability (EFP)

Exploring new statistical metrics to evaluate the magnitude distribution of earthquake forecasting models

Francesco Serafini, Mark Naylor, Maximilian J. Werner, Leila Mizrahi, Marta Han, Kirsty Bayliss, Pablo C. Iturrieta, & José A. Bayona
Poster Image: 

Poster Presentation

2024 SCEC Annual Meeting, Poster #200, SCEC Contribution #13875 VIEW PDF
Evaluating earthquake forecasts is a crucial step in understanding and improving the capabilities of forecasting models. The use of specific metrics to assess the consistency between forecasts and data on one particular aspect of the process is important to understand which aspects of seismicity a model is failing to describe, and, consequently, highlight where new versions of the model should improve. This can be done effectively only having metrics unaffected by inconsistencies in other aspects of the process. The Collaboratory Study for Earthquake predictability (CSEP), which organises earthquake forecasting experiments around the globe, has developed different tests targeting different a...spects of the process such as the N-test or the M-test respectively assessing consistency between the observed and forecasted number of events, and magnitude distributions. We found that the results of the M-test recently proposed for catalog-based forecasts (composed by a collection of synthetic catalogues from the model) depends on the N-test, i.e. the two tests do not isolate the desired aspects appropriately. We show this problem using simulated data and provide a possible solution. We implement the solution in PyCSEP and rerun two analyses (one for Europe, one for Switzerland) where the M-test was calculated for models failing the N-test and analyse how the test results change using the proposed solution. Lastly, we investigate alternative metrics (unnormalised M-test, Chi-square, Hellinger distance, Brier score, and a novel Multinomial log-likelihood score) and compare the metrics based on the ability to find inconsistency between data and forecast in various synthetic scenarios. We find that the multinomial log-likelihood score (MLL) outperforms the M-test. We also study how the performance of the score changes as the cutoff magnitude and the number of observations vary comparing the M-test and MLL score with classical statistical tests to assess differences in distributions. We found that also in this case the MLL outperforms the competitors. Therefore, we use the MLL to assess the consistency of the forecasts for Europe and Switzerland and compare the results with the ones provided by the M-test.
SHOW MORE