On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval

Total Cloud Cover (TCC) retrieval from ground-based optical imagery is a problem that has been tackled by several generations of researchers. The number of human-designed algorithms for the estimation of TCC grows every year. However, there has been no considerable progress in terms of quality, most...

Full description

Bibliographic Details
Published in:	Remote Sensing
Main Authors:	Mikhail Krinitskiy, Marina Aleksandrova, Polina Verezemskaya, Sergey Gulev, Alexey Sinitsyn, Nadezhda Kovaleva, Alexander Gavrikov
Format:	Text
Language:	English
Published:	Multidisciplinary Digital Publishing Institute 2021
Subjects:	total cloud cover all-sky camera algorithms assessment neural networks machine learning data-driven approach Arctic Arctic Ocean Indian
Online Access:	https://doi.org/10.3390/rs13020326

id	ftmdpi:oai:mdpi.com:/2072-4292/13/2/326/
record_format	openpolar
spelling	ftmdpi:oai:mdpi.com:/2072-4292/13/2/326/ 2023-08-20T04:05:01+02:00 On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval Mikhail Krinitskiy Marina Aleksandrova Polina Verezemskaya Sergey Gulev Alexey Sinitsyn Nadezhda Kovaleva Alexander Gavrikov agris 2021-01-19 application/pdf https://doi.org/10.3390/rs13020326 EN eng Multidisciplinary Digital Publishing Institute Atmospheric Remote Sensing https://dx.doi.org/10.3390/rs13020326 https://creativecommons.org/licenses/by/4.0/ Remote Sensing; Volume 13; Issue 2; Pages: 326 total cloud cover all-sky camera algorithms assessment neural networks machine learning data-driven approach Text 2021 ftmdpi https://doi.org/10.3390/rs13020326 2023-08-01T00:53:48Z Total Cloud Cover (TCC) retrieval from ground-based optical imagery is a problem that has been tackled by several generations of researchers. The number of human-designed algorithms for the estimation of TCC grows every year. However, there has been no considerable progress in terms of quality, mostly due to the lack of systematic approach to the design of the algorithms, to the assessment of their generalization ability, and to the assessment of the TCC retrieval quality. In this study, we discuss the optimization nature of data-driven schemes for TCC retrieval. In order to compare the algorithms, we propose a framework for the assessment of the algorithms’ characteristics. We present several new algorithms that are based on deep learning techniques: A model for outliers filtering, and a few models for TCC retrieval from all-sky imagery. For training and assessment of data-driven algorithms of this study, we present the Dataset of All-Sky Imagery over the Ocean (DASIO) containing over one million all-sky optical images of the visible sky dome taken in various regions of the world ocean. The research campaigns that contributed to the DASIO collection took place in the Atlantic ocean, the Indian ocean, the Red and Mediterranean seas, and the Arctic ocean. Optical imagery collected during these missions are accompanied by standard meteorological observations of cloudiness characteristics made by experienced observers. We assess the generalization ability of the presented models in several scenarios that differ in terms of the regions selected for the train and test subsets. As a result, we demonstrate that our models based on convolutional neural networks deliver a superior quality compared to all previously published approaches. As a key result, we demonstrate a considerable drop in the ability to generalize the training data in the case of a strong covariate shift between the training and test subsets of imagery which may occur in the case of region-aware subsampling. Text Arctic Arctic Ocean MDPI Open Access Publishing Arctic Arctic Ocean Indian Remote Sensing 13 2 326
institution	Open Polar
collection	MDPI Open Access Publishing
op_collection_id	ftmdpi
language	English
topic	total cloud cover all-sky camera algorithms assessment neural networks machine learning data-driven approach
spellingShingle	total cloud cover all-sky camera algorithms assessment neural networks machine learning data-driven approach Mikhail Krinitskiy Marina Aleksandrova Polina Verezemskaya Sergey Gulev Alexey Sinitsyn Nadezhda Kovaleva Alexander Gavrikov On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval
topic_facet	total cloud cover all-sky camera algorithms assessment neural networks machine learning data-driven approach
description	Total Cloud Cover (TCC) retrieval from ground-based optical imagery is a problem that has been tackled by several generations of researchers. The number of human-designed algorithms for the estimation of TCC grows every year. However, there has been no considerable progress in terms of quality, mostly due to the lack of systematic approach to the design of the algorithms, to the assessment of their generalization ability, and to the assessment of the TCC retrieval quality. In this study, we discuss the optimization nature of data-driven schemes for TCC retrieval. In order to compare the algorithms, we propose a framework for the assessment of the algorithms’ characteristics. We present several new algorithms that are based on deep learning techniques: A model for outliers filtering, and a few models for TCC retrieval from all-sky imagery. For training and assessment of data-driven algorithms of this study, we present the Dataset of All-Sky Imagery over the Ocean (DASIO) containing over one million all-sky optical images of the visible sky dome taken in various regions of the world ocean. The research campaigns that contributed to the DASIO collection took place in the Atlantic ocean, the Indian ocean, the Red and Mediterranean seas, and the Arctic ocean. Optical imagery collected during these missions are accompanied by standard meteorological observations of cloudiness characteristics made by experienced observers. We assess the generalization ability of the presented models in several scenarios that differ in terms of the regions selected for the train and test subsets. As a result, we demonstrate that our models based on convolutional neural networks deliver a superior quality compared to all previously published approaches. As a key result, we demonstrate a considerable drop in the ability to generalize the training data in the case of a strong covariate shift between the training and test subsets of imagery which may occur in the case of region-aware subsampling.
format	Text
author	Mikhail Krinitskiy Marina Aleksandrova Polina Verezemskaya Sergey Gulev Alexey Sinitsyn Nadezhda Kovaleva Alexander Gavrikov
author_facet	Mikhail Krinitskiy Marina Aleksandrova Polina Verezemskaya Sergey Gulev Alexey Sinitsyn Nadezhda Kovaleva Alexander Gavrikov
author_sort	Mikhail Krinitskiy
title	On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval
title_short	On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval
title_full	On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval
title_fullStr	On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval
title_full_unstemmed	On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval
title_sort	on the generalization ability of data-driven models in the problem of total cloud cover retrieval
publisher	Multidisciplinary Digital Publishing Institute
publishDate	2021
url	https://doi.org/10.3390/rs13020326
op_coverage	agris
geographic	Arctic Arctic Ocean Indian
geographic_facet	Arctic Arctic Ocean Indian
genre	Arctic Arctic Ocean
genre_facet	Arctic Arctic Ocean
op_source	Remote Sensing; Volume 13; Issue 2; Pages: 326
op_relation	Atmospheric Remote Sensing https://dx.doi.org/10.3390/rs13020326
op_rights	https://creativecommons.org/licenses/by/4.0/
op_doi	https://doi.org/10.3390/rs13020326
container_title	Remote Sensing
container_volume	13
container_issue	2
container_start_page	326
_version_	1774715444619378688

On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval

Similar Items