On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval

Total Cloud Cover (TCC) retrieval from ground-based optical imagery is a problem that has been tackled by several generations of researchers. The number of human-designed algorithms for the estimation of TCC grows every year. However, there has been no considerable progress in terms of quality, most...

Full description

Bibliographic Details
Published in:Remote Sensing
Main Authors: Mikhail Krinitskiy, Marina Aleksandrova, Polina Verezemskaya, Sergey Gulev, Alexey Sinitsyn, Nadezhda Kovaleva, Alexander Gavrikov
Format: Article in Journal/Newspaper
Language:English
Published: MDPI AG 2021
Subjects:
Q
Online Access:https://doi.org/10.3390/rs13020326
https://doaj.org/article/fc50fc8eee494c24b72d8a590a41c655
id ftdoajarticles:oai:doaj.org/article:fc50fc8eee494c24b72d8a590a41c655
record_format openpolar
spelling ftdoajarticles:oai:doaj.org/article:fc50fc8eee494c24b72d8a590a41c655 2024-01-07T09:42:02+01:00 On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval Mikhail Krinitskiy Marina Aleksandrova Polina Verezemskaya Sergey Gulev Alexey Sinitsyn Nadezhda Kovaleva Alexander Gavrikov 2021-01-01T00:00:00Z https://doi.org/10.3390/rs13020326 https://doaj.org/article/fc50fc8eee494c24b72d8a590a41c655 EN eng MDPI AG https://www.mdpi.com/2072-4292/13/2/326 https://doaj.org/toc/2072-4292 doi:10.3390/rs13020326 2072-4292 https://doaj.org/article/fc50fc8eee494c24b72d8a590a41c655 Remote Sensing, Vol 13, Iss 2, p 326 (2021) total cloud cover all-sky camera algorithms assessment neural networks machine learning data-driven approach Science Q article 2021 ftdoajarticles https://doi.org/10.3390/rs13020326 2023-12-10T01:44:33Z Total Cloud Cover (TCC) retrieval from ground-based optical imagery is a problem that has been tackled by several generations of researchers. The number of human-designed algorithms for the estimation of TCC grows every year. However, there has been no considerable progress in terms of quality, mostly due to the lack of systematic approach to the design of the algorithms, to the assessment of their generalization ability, and to the assessment of the TCC retrieval quality. In this study, we discuss the optimization nature of data-driven schemes for TCC retrieval. In order to compare the algorithms, we propose a framework for the assessment of the algorithms’ characteristics. We present several new algorithms that are based on deep learning techniques: A model for outliers filtering, and a few models for TCC retrieval from all-sky imagery. For training and assessment of data-driven algorithms of this study, we present the Dataset of All-Sky Imagery over the Ocean (DASIO) containing over one million all-sky optical images of the visible sky dome taken in various regions of the world ocean. The research campaigns that contributed to the DASIO collection took place in the Atlantic ocean, the Indian ocean, the Red and Mediterranean seas, and the Arctic ocean. Optical imagery collected during these missions are accompanied by standard meteorological observations of cloudiness characteristics made by experienced observers. We assess the generalization ability of the presented models in several scenarios that differ in terms of the regions selected for the train and test subsets. As a result, we demonstrate that our models based on convolutional neural networks deliver a superior quality compared to all previously published approaches. As a key result, we demonstrate a considerable drop in the ability to generalize the training data in the case of a strong covariate shift between the training and test subsets of imagery which may occur in the case of region-aware subsampling. Article in Journal/Newspaper Arctic Arctic Ocean Directory of Open Access Journals: DOAJ Articles Arctic Arctic Ocean Indian Remote Sensing 13 2 326
institution Open Polar
collection Directory of Open Access Journals: DOAJ Articles
op_collection_id ftdoajarticles
language English
topic total cloud cover
all-sky camera
algorithms assessment
neural networks
machine learning
data-driven approach
Science
Q
spellingShingle total cloud cover
all-sky camera
algorithms assessment
neural networks
machine learning
data-driven approach
Science
Q
Mikhail Krinitskiy
Marina Aleksandrova
Polina Verezemskaya
Sergey Gulev
Alexey Sinitsyn
Nadezhda Kovaleva
Alexander Gavrikov
On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval
topic_facet total cloud cover
all-sky camera
algorithms assessment
neural networks
machine learning
data-driven approach
Science
Q
description Total Cloud Cover (TCC) retrieval from ground-based optical imagery is a problem that has been tackled by several generations of researchers. The number of human-designed algorithms for the estimation of TCC grows every year. However, there has been no considerable progress in terms of quality, mostly due to the lack of systematic approach to the design of the algorithms, to the assessment of their generalization ability, and to the assessment of the TCC retrieval quality. In this study, we discuss the optimization nature of data-driven schemes for TCC retrieval. In order to compare the algorithms, we propose a framework for the assessment of the algorithms’ characteristics. We present several new algorithms that are based on deep learning techniques: A model for outliers filtering, and a few models for TCC retrieval from all-sky imagery. For training and assessment of data-driven algorithms of this study, we present the Dataset of All-Sky Imagery over the Ocean (DASIO) containing over one million all-sky optical images of the visible sky dome taken in various regions of the world ocean. The research campaigns that contributed to the DASIO collection took place in the Atlantic ocean, the Indian ocean, the Red and Mediterranean seas, and the Arctic ocean. Optical imagery collected during these missions are accompanied by standard meteorological observations of cloudiness characteristics made by experienced observers. We assess the generalization ability of the presented models in several scenarios that differ in terms of the regions selected for the train and test subsets. As a result, we demonstrate that our models based on convolutional neural networks deliver a superior quality compared to all previously published approaches. As a key result, we demonstrate a considerable drop in the ability to generalize the training data in the case of a strong covariate shift between the training and test subsets of imagery which may occur in the case of region-aware subsampling.
format Article in Journal/Newspaper
author Mikhail Krinitskiy
Marina Aleksandrova
Polina Verezemskaya
Sergey Gulev
Alexey Sinitsyn
Nadezhda Kovaleva
Alexander Gavrikov
author_facet Mikhail Krinitskiy
Marina Aleksandrova
Polina Verezemskaya
Sergey Gulev
Alexey Sinitsyn
Nadezhda Kovaleva
Alexander Gavrikov
author_sort Mikhail Krinitskiy
title On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval
title_short On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval
title_full On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval
title_fullStr On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval
title_full_unstemmed On the Generalization Ability of Data-Driven Models in the Problem of Total Cloud Cover Retrieval
title_sort on the generalization ability of data-driven models in the problem of total cloud cover retrieval
publisher MDPI AG
publishDate 2021
url https://doi.org/10.3390/rs13020326
https://doaj.org/article/fc50fc8eee494c24b72d8a590a41c655
geographic Arctic
Arctic Ocean
Indian
geographic_facet Arctic
Arctic Ocean
Indian
genre Arctic
Arctic Ocean
genre_facet Arctic
Arctic Ocean
op_source Remote Sensing, Vol 13, Iss 2, p 326 (2021)
op_relation https://www.mdpi.com/2072-4292/13/2/326
https://doaj.org/toc/2072-4292
doi:10.3390/rs13020326
2072-4292
https://doaj.org/article/fc50fc8eee494c24b72d8a590a41c655
op_doi https://doi.org/10.3390/rs13020326
container_title Remote Sensing
container_volume 13
container_issue 2
container_start_page 326
_version_ 1787422888099315712