How should we aggregate data? Methods accounting for the numerical distributions, with an assessment of aerosol optical depth

Many applications of geophysical data – whether from surface observations, satellite retrievals, or model simulations – rely on aggregates produced at coarser spatial (e.g. degrees) and/or temporal (e.g. daily and monthly) resolution than the highest available from the technique. Almost all of these...

Full description

Bibliographic Details
Published in:Atmospheric Chemistry and Physics
Main Authors: Sayer, Andrew M., Knobelspiesse, Kirk D.
Format: Article in Journal/Newspaper
Language:English
Published: Copernicus Publications 2019
Subjects:
Online Access:https://doi.org/10.5194/acp-19-15023-2019
https://noa.gwlb.de/receive/cop_mods_00049766
https://noa.gwlb.de/servlets/MCRFileNodeServlet/cop_derivate_00049385/acp-19-15007-2019.pdf
https://acp.copernicus.org/articles/19/15023/2019/acp-19-15023-2019.pdf
id ftnonlinearchiv:oai:noa.gwlb.de:cop_mods_00049766
record_format openpolar
institution Open Polar
collection Niedersächsisches Online-Archiv NOA
op_collection_id ftnonlinearchiv
language English
topic article
Verlagsveröffentlichung
spellingShingle article
Verlagsveröffentlichung
Sayer, Andrew M.
Knobelspiesse, Kirk D.
How should we aggregate data? Methods accounting for the numerical distributions, with an assessment of aerosol optical depth
topic_facet article
Verlagsveröffentlichung
description Many applications of geophysical data – whether from surface observations, satellite retrievals, or model simulations – rely on aggregates produced at coarser spatial (e.g. degrees) and/or temporal (e.g. daily and monthly) resolution than the highest available from the technique. Almost all of these aggregates report the arithmetic mean and standard deviation as summary statistics, which are what data users employ in their analyses. These statistics are most meaningful for normally distributed data; however, for some quantities, such as aerosol optical depth (AOD), it is well-known that distributions are on large scales closer to log-normal, for which a geometric mean and standard deviation would be more appropriate. This study presents a method of assessing whether a given sample of data is more consistent with an underlying normal or log-normal distribution, using the Shapiro–Wilk test, and tests AOD frequency distributions on spatial scales of 1∘ and daily, monthly, and seasonal temporal scales. A broadly consistent picture is observed using Aerosol Robotic Network (AERONET), Multiangle Imaging SpectroRadiometer (MISR), Moderate Resolution Imagining Spectroradiometer (MODIS), and Goddard Earth Observing System Version 5 Nature Run (G5NR) data. These data sets are complementary: AERONET has the highest AOD accuracy but is sparse, and MISR and MODIS represent different satellite retrieval techniques and sampling. As a model simulation, G5NR is spatiotemporally complete. As timescales increase from days to months to seasons, data become increasingly more consistent with log-normal than normal distributions, and the differences between arithmetic- and geometric-mean AOD become larger, with geometric mean becoming systematically smaller. Assuming normality systematically overstates both the typical level of AOD and its variability. There is considerable regional heterogeneity in the results: in low-AOD regions such as the open ocean and mountains, often the AOD difference is small enough (<0.01) to be unimportant for many applications, especially on daily timescales. However, in continental outflow regions and near source regions over land, and on monthly or seasonal timescales, the difference is frequently larger than the Global Climate Observation System (GCOS) goal uncertainty in a climate data record (the larger of 0.03 or 10 %). This is important because it shows that the sensitivity to an averaging method can and often does introduce systematic effects larger than the total goal GCOS uncertainty. Using three well-studied AERONET sites, the magnitude of estimated AOD trends is shown to be sensitive to the choice of arithmetic vs. geometric means, although the signs are consistent. The main recommendations from the study are that (1) the distribution of a geophysical quantity should be analysed in order to assess how best to aggregate it, (2) ideally AOD aggregates such as satellite level 3 products (but also ground-based data and model simulations) should report a geometric-mean or median AOD rather than (or in addition to) arithmetic-mean AOD, and (3) as this is unlikely in the short term due to the computational burden involved, users can calculate geometric-mean monthly aggregates from widely available daily mean data as a stopgap, as daily aggregates are less sensitive to the choice of aggregation scheme than those for monthly or seasonal aggregates. Furthermore, distribution shapes can have implications for the validity of statistical metrics often used for comparison and evaluation of data sets. The methodology is not restricted to AOD and can be applied to other quantities.
format Article in Journal/Newspaper
author Sayer, Andrew M.
Knobelspiesse, Kirk D.
author_facet Sayer, Andrew M.
Knobelspiesse, Kirk D.
author_sort Sayer, Andrew M.
title How should we aggregate data? Methods accounting for the numerical distributions, with an assessment of aerosol optical depth
title_short How should we aggregate data? Methods accounting for the numerical distributions, with an assessment of aerosol optical depth
title_full How should we aggregate data? Methods accounting for the numerical distributions, with an assessment of aerosol optical depth
title_fullStr How should we aggregate data? Methods accounting for the numerical distributions, with an assessment of aerosol optical depth
title_full_unstemmed How should we aggregate data? Methods accounting for the numerical distributions, with an assessment of aerosol optical depth
title_sort how should we aggregate data? methods accounting for the numerical distributions, with an assessment of aerosol optical depth
publisher Copernicus Publications
publishDate 2019
url https://doi.org/10.5194/acp-19-15023-2019
https://noa.gwlb.de/receive/cop_mods_00049766
https://noa.gwlb.de/servlets/MCRFileNodeServlet/cop_derivate_00049385/acp-19-15007-2019.pdf
https://acp.copernicus.org/articles/19/15023/2019/acp-19-15023-2019.pdf
genre Aerosol Robotic Network
genre_facet Aerosol Robotic Network
op_relation Atmospheric Chemistry and Physics -- http://www.atmos-chem-phys.net/volumes_and_issues.html -- http://www.bibliothek.uni-regensburg.de/ezeit/?2069847 -- 1680-7324
https://doi.org/10.5194/acp-19-15023-2019
https://noa.gwlb.de/receive/cop_mods_00049766
https://noa.gwlb.de/servlets/MCRFileNodeServlet/cop_derivate_00049385/acp-19-15007-2019.pdf
https://acp.copernicus.org/articles/19/15023/2019/acp-19-15023-2019.pdf
op_rights https://creativecommons.org/licenses/by/4.0/
uneingeschränkt
info:eu-repo/semantics/openAccess
op_rightsnorm CC-BY
op_doi https://doi.org/10.5194/acp-19-15023-2019
container_title Atmospheric Chemistry and Physics
container_volume 19
container_issue 23
container_start_page 15023
op_container_end_page 15048
_version_ 1766046022072532992
spelling ftnonlinearchiv:oai:noa.gwlb.de:cop_mods_00049766 2023-05-15T13:07:17+02:00 How should we aggregate data? Methods accounting for the numerical distributions, with an assessment of aerosol optical depth Sayer, Andrew M. Knobelspiesse, Kirk D. 2019-12 electronic https://doi.org/10.5194/acp-19-15023-2019 https://noa.gwlb.de/receive/cop_mods_00049766 https://noa.gwlb.de/servlets/MCRFileNodeServlet/cop_derivate_00049385/acp-19-15007-2019.pdf https://acp.copernicus.org/articles/19/15023/2019/acp-19-15023-2019.pdf eng eng Copernicus Publications Atmospheric Chemistry and Physics -- http://www.atmos-chem-phys.net/volumes_and_issues.html -- http://www.bibliothek.uni-regensburg.de/ezeit/?2069847 -- 1680-7324 https://doi.org/10.5194/acp-19-15023-2019 https://noa.gwlb.de/receive/cop_mods_00049766 https://noa.gwlb.de/servlets/MCRFileNodeServlet/cop_derivate_00049385/acp-19-15007-2019.pdf https://acp.copernicus.org/articles/19/15023/2019/acp-19-15023-2019.pdf https://creativecommons.org/licenses/by/4.0/ uneingeschränkt info:eu-repo/semantics/openAccess CC-BY article Verlagsveröffentlichung article Text doc-type:article 2019 ftnonlinearchiv https://doi.org/10.5194/acp-19-15023-2019 2022-02-08T22:37:10Z Many applications of geophysical data – whether from surface observations, satellite retrievals, or model simulations – rely on aggregates produced at coarser spatial (e.g. degrees) and/or temporal (e.g. daily and monthly) resolution than the highest available from the technique. Almost all of these aggregates report the arithmetic mean and standard deviation as summary statistics, which are what data users employ in their analyses. These statistics are most meaningful for normally distributed data; however, for some quantities, such as aerosol optical depth (AOD), it is well-known that distributions are on large scales closer to log-normal, for which a geometric mean and standard deviation would be more appropriate. This study presents a method of assessing whether a given sample of data is more consistent with an underlying normal or log-normal distribution, using the Shapiro–Wilk test, and tests AOD frequency distributions on spatial scales of 1∘ and daily, monthly, and seasonal temporal scales. A broadly consistent picture is observed using Aerosol Robotic Network (AERONET), Multiangle Imaging SpectroRadiometer (MISR), Moderate Resolution Imagining Spectroradiometer (MODIS), and Goddard Earth Observing System Version 5 Nature Run (G5NR) data. These data sets are complementary: AERONET has the highest AOD accuracy but is sparse, and MISR and MODIS represent different satellite retrieval techniques and sampling. As a model simulation, G5NR is spatiotemporally complete. As timescales increase from days to months to seasons, data become increasingly more consistent with log-normal than normal distributions, and the differences between arithmetic- and geometric-mean AOD become larger, with geometric mean becoming systematically smaller. Assuming normality systematically overstates both the typical level of AOD and its variability. There is considerable regional heterogeneity in the results: in low-AOD regions such as the open ocean and mountains, often the AOD difference is small enough (<0.01) to be unimportant for many applications, especially on daily timescales. However, in continental outflow regions and near source regions over land, and on monthly or seasonal timescales, the difference is frequently larger than the Global Climate Observation System (GCOS) goal uncertainty in a climate data record (the larger of 0.03 or 10 %). This is important because it shows that the sensitivity to an averaging method can and often does introduce systematic effects larger than the total goal GCOS uncertainty. Using three well-studied AERONET sites, the magnitude of estimated AOD trends is shown to be sensitive to the choice of arithmetic vs. geometric means, although the signs are consistent. The main recommendations from the study are that (1) the distribution of a geophysical quantity should be analysed in order to assess how best to aggregate it, (2) ideally AOD aggregates such as satellite level 3 products (but also ground-based data and model simulations) should report a geometric-mean or median AOD rather than (or in addition to) arithmetic-mean AOD, and (3) as this is unlikely in the short term due to the computational burden involved, users can calculate geometric-mean monthly aggregates from widely available daily mean data as a stopgap, as daily aggregates are less sensitive to the choice of aggregation scheme than those for monthly or seasonal aggregates. Furthermore, distribution shapes can have implications for the validity of statistical metrics often used for comparison and evaluation of data sets. The methodology is not restricted to AOD and can be applied to other quantities. Article in Journal/Newspaper Aerosol Robotic Network Niedersächsisches Online-Archiv NOA Atmospheric Chemistry and Physics 19 23 15023 15048