Understanding Internal Cluster Variability Through Subcluster Metric Analysis in a Geophysical Context

Abstract Clustering algorithms are commonly used for inspecting the behavior of clouds in both model and satellite data sets. Often overlooked in cluster analysis is the variability that occurs within any clusters generated. This is particularly important in the geophysics where clusters are often g...

Full description

Bibliographic Details
Published in:Earth and Space Science
Main Authors: A. J. Schuddeboom, A. J. McDonald
Format: Article in Journal/Newspaper
Language:English
Published: American Geophysical Union (AGU) 2023
Subjects:
Online Access:https://doi.org/10.1029/2022EA002373
https://doaj.org/article/6fafa3b5a66c484fa1de9e850c90e361
Description
Summary:Abstract Clustering algorithms are commonly used for inspecting the behavior of clouds in both model and satellite data sets. Often overlooked in cluster analysis is the variability that occurs within any clusters generated. This is particularly important in the geophysics where clusters are often generated with a focus on interpretability over mathematical optimization. Two metrics, the Davies‐Bouldin index and the subsom entropy, are used to identify clusters with large internal variability. These metrics are applied to an example set of clusters from prior research that were generated using cloud top pressure‐cloud optical thickness joint histograms from the Moderate Resolution Imaging Spectroradiometer data set. Applying these metrics to the clusters identifies one cluster in particular as a major outlier. Examining the calculations behind these metrics in more detail provides further information about the internal variability of the clusters. The clusters are also examined over several geographic regions showing mostly consistent behavior. There are, however, some large anomalies such as the behavior of the clear sky cluster or the behavior of several different clusters over the Arctic Ocean. To aide our interpretation of these results, two clusters are chosen for a detailed analysis of their subclusters. The geographic distributions and radiative properties of these subclusters are examined and clearly identify that subclusters have physically distinct behavior. This result illustrates that these metrics are capable of determining when a cluster contains physically distinct subclusters. This demonstrates the potential utility of these metrics if they were applied to other geophysical data sets.