Ranking earthquake forecasts using proper scoring rules: binary events in a low probability environment

Francesco Serafini, Mark Naylor, Finn Lindgren, Maximilian J. Werner, & Ian Main

Published March 28, 2022, SCEC Contribution #11791

Operational earthquake forecasting for risk management and communication during seismic sequences depends on our ability to select an optimal forecasting model. To do this, we need to compare the performance of competing models in prospective experiments, and to rank their performance according to the outcome using a fair, reproducible and reliable method, usually in a low-probability environment. The Collaboratory for the Study of Earthquake Predictability conducts prospective earthquake forecasting experiments around the globe. In this framework, it is crucial that the metrics used to rank the competing forecasts are ‘proper’, meaning that, on average, they prefer the data generating model. We prove that the Parimutuel Gambling score, proposed, and in some cases applied, as a metric for comparing probabilistic seismicity forecasts, is in general ‘improper’. In the special case where it is proper, we show it can still be used improperly. We demonstrate the conclusions both analytically and graphically providing a set of simulation based techniques that can be used to assess if a score is proper or not. They only require a data generating model and, at least two forecasts to be compared. We compare the Parimutuel Gambling score’s performance with two commonly used proper scores (the Brier and logarithmic scores) using confidence intervals to account for the uncertainty around the observed score difference. We suggest that using confidence intervals enables a rigorous approach to distinguish between the predictive skills of candidate forecasts, in addition to their rankings. Our analysis shows that the Parimutuel Gambling score is biased, and the direction of the bias depends on the forecasts taking part in the experiment. Our findings suggest the Parimutuel Gambling score should not be used to distinguishing between multiple competing forecasts, and for care to be taken in the case where only two are being compared.

Key Words
forecasting, probabilistic scoring, model ranking

Citation
Serafini, F., Naylor, M., Lindgren, F., Werner, M. J., & Main, I. (2022). Ranking earthquake forecasts using proper scoring rules: binary events in a low probability environment. Geophysical Journal International, 230(2), 1419-1440. doi: 10.1093/gji/ggac124. https://doi.org/10.1093/gji/ggac124


Related Projects & Working Groups
Earthquake Forecasting and Predictability (EFP), CSEP