Artificial intelligence for climate prediction of extremes: State of the art, challenges, and future perspectives
This review aims to overview achievements and challenges in the use of artificial intelligence (AI) techniques to improve the prediction of extremes at the subseasonal to decadal timescale. Extreme events such as heat waves and cold spells, droughts, heavy rain, and storms are particularly challenging to predict accurately due to their rarity and chaotic nature, and because of model limitations. AI techniques have shown great potential to improve the prediction of extreme events and uncover their links to large-scale and local drivers. Machine and deep learning have been explored to enhance prediction, while causal discovery and explainable AI have been tested to improve our understanding of the processes underlying predictability. Hybrid predictions combining AI, which can reveal unknown spatiotemporal connections from data, with climate models that provide the theoretical foundation and interpretability of the physical world, have shown that improving prediction skills of extremes on climate-relevant timescales is possible.
To improve trust in AI-based forecast of extremes, the authors recommend the following “good practices”, including:
- Data, workflows, and analyses should be transparent and easily reproducible across different big-volume datasets. This can technically be achieved by linking open-source software to big climate data platforms, then studies should provide access to the source code, the actual AI model (via appropriate repository, e.g., Github), and exact data used, including preprocessing and postprocessing (on publicly accessible data platforms, e.g., Climate Data Store).
- Studies should use standardized benchmark datasets and multiple skill-metrics. The use of single and/or uncritical skill metrics (e.g., correlation or area under the ROC curve) can easily lead to inflated skill estimates.
- Validation should be described step by step, and preferably multiple cross-validation approaches should be tested, being aware of the possibility of information leakage from train to test data. Ideally, all pre-processing (deseasonalizing, standardizing, etc.) is performed out of sample, though in practice this can be challenging due to lack of independent data samples.