A novel heuristic method for detecting overfit in unsupervised classification of climate model data

Abstract Unsupervised classification is becoming an increasingly common method to objectively identify coherent structures within both observed and modelled climate data. However, in most applications using this method, the user must choose the number of classes into which the data are to be sorted...

Full description

Bibliographic Details
Published in:Environmental Data Science
Main Authors: Boland, Emma J. D., Atkinson, Erin, Jones, Dani C.
Other Authors: Natural Environment Research Council, UK Research and Innovation
Format: Article in Journal/Newspaper
Language:English
Published: Cambridge University Press (CUP) 2023
Subjects:
Online Access:http://dx.doi.org/10.1017/eds.2023.40
https://www.cambridge.org/core/services/aop-cambridge-core/content/view/S2634460223000407
id crcambridgeupr:10.1017/eds.2023.40
record_format openpolar
spelling crcambridgeupr:10.1017/eds.2023.40 2024-03-03T08:48:55+00:00 A novel heuristic method for detecting overfit in unsupervised classification of climate model data Boland, Emma J. D. Atkinson, Erin Jones, Dani C. Natural Environment Research Council UK Research and Innovation 2023 http://dx.doi.org/10.1017/eds.2023.40 https://www.cambridge.org/core/services/aop-cambridge-core/content/view/S2634460223000407 en eng Cambridge University Press (CUP) http://creativecommons.org/licenses/by/4.0 Environmental Data Science volume 2 ISSN 2634-4602 journal-article 2023 crcambridgeupr https://doi.org/10.1017/eds.2023.40 2024-02-08T08:31:11Z Abstract Unsupervised classification is becoming an increasingly common method to objectively identify coherent structures within both observed and modelled climate data. However, in most applications using this method, the user must choose the number of classes into which the data are to be sorted in advance. Typically, a combination of statistical methods and expertise is used to choose the appropriate number of classes for a given study; however, it may not be possible to identify a single “optimal” number of classes. In this work, we present a heuristic method, the ensemble difference criterion, for unambiguously determining the maximum number of classes supported by model data ensembles. This method requires robustness in the class definition between simulated ensembles of the system of interest. For demonstration, we apply this to the clustering of Southern Ocean potential temperatures in a CMIP6 climate model, and show that the data supports between four and seven classes of a Gaussian mixture model. Article in Journal/Newspaper Southern Ocean Cambridge University Press Southern Ocean Environmental Data Science 2
institution Open Polar
collection Cambridge University Press
op_collection_id crcambridgeupr
language English
description Abstract Unsupervised classification is becoming an increasingly common method to objectively identify coherent structures within both observed and modelled climate data. However, in most applications using this method, the user must choose the number of classes into which the data are to be sorted in advance. Typically, a combination of statistical methods and expertise is used to choose the appropriate number of classes for a given study; however, it may not be possible to identify a single “optimal” number of classes. In this work, we present a heuristic method, the ensemble difference criterion, for unambiguously determining the maximum number of classes supported by model data ensembles. This method requires robustness in the class definition between simulated ensembles of the system of interest. For demonstration, we apply this to the clustering of Southern Ocean potential temperatures in a CMIP6 climate model, and show that the data supports between four and seven classes of a Gaussian mixture model.
author2 Natural Environment Research Council
UK Research and Innovation
format Article in Journal/Newspaper
author Boland, Emma J. D.
Atkinson, Erin
Jones, Dani C.
spellingShingle Boland, Emma J. D.
Atkinson, Erin
Jones, Dani C.
A novel heuristic method for detecting overfit in unsupervised classification of climate model data
author_facet Boland, Emma J. D.
Atkinson, Erin
Jones, Dani C.
author_sort Boland, Emma J. D.
title A novel heuristic method for detecting overfit in unsupervised classification of climate model data
title_short A novel heuristic method for detecting overfit in unsupervised classification of climate model data
title_full A novel heuristic method for detecting overfit in unsupervised classification of climate model data
title_fullStr A novel heuristic method for detecting overfit in unsupervised classification of climate model data
title_full_unstemmed A novel heuristic method for detecting overfit in unsupervised classification of climate model data
title_sort novel heuristic method for detecting overfit in unsupervised classification of climate model data
publisher Cambridge University Press (CUP)
publishDate 2023
url http://dx.doi.org/10.1017/eds.2023.40
https://www.cambridge.org/core/services/aop-cambridge-core/content/view/S2634460223000407
geographic Southern Ocean
geographic_facet Southern Ocean
genre Southern Ocean
genre_facet Southern Ocean
op_source Environmental Data Science
volume 2
ISSN 2634-4602
op_rights http://creativecommons.org/licenses/by/4.0
op_doi https://doi.org/10.1017/eds.2023.40
container_title Environmental Data Science
container_volume 2
_version_ 1792505964382912512