IPB-MSA&SO4: a daily 0.25° resolution dataset of In-situ Produced Biogenic Methanesulfonic Acid and Sulfate over the North Atlantic during 1998–2022 based on machine learning

Accurate long-term marine-derived biogenic sulfur aerosol concentrations at high spatial and temporal resolutions are critical for a wide range of studies including climatology, trend analysis, model evaluation, accurate investigation of their contribution to aerosol burden, or to elucidate their ra...

Full description

Bibliographic Details
Main Authors: Mansour, Karam, Decesari, Stefano, Ceburnis, Darius, Ovadnevaite, Jurgita, Russell, Lynn M., Paglione, Marco, Poulain, Laurent, Huang, Shan, O'Dowd, Colin, Rinaldi, Matteo
Format: Text
Language:English
Published: 2023
Subjects:
Online Access:https://doi.org/10.5194/essd-2023-352
https://essd.copernicus.org/preprints/essd-2023-352/
Description
Summary:Accurate long-term marine-derived biogenic sulfur aerosol concentrations at high spatial and temporal resolutions are critical for a wide range of studies including climatology, trend analysis, model evaluation, accurate investigation of their contribution to aerosol burden, or to elucidate their radiative impacts and to provide boundary conditions for regional models. By applying machine learning algorithms, we constructed the first, publicly available, daily gridded dataset of in-situ produced biogenic methanesulfonic acid (MSA) and sulfate (SO 4 ) concentrations covering the North Atlantic Ocean. The dataset is of high spatial resolution of 0.25° × 0.25°, spanning 25 years (1998–2022), far exceeding what observations alone could achieve both space- and time-wise. The machine learning models were generated by combining in-situ observations of sulfur aerosol data at Mace Head research station, west coast of Ireland, and from NAAMES cruises in the NW Atlantic, combined with the constructed sea-to-air dimethylsulfide flux (F DMS ) and ECMWF-ERA5 reanalysis datasets. To determine the optimal method for regression, we employed four machine learning model types: support vector machines, ensemble, Gaussian process, and artificial neural networks. A comparison of the mean absolute error (MAE), root mean square error (RMSE), and coefficient of determination (R 2 ) revealed that the Gaussian process regression (GPR) was the most effective algorithm, outperforming the other models in simulating the biogenic MSA and SO 4 concentrations. For predicting daily MSA (SO 4 ), GPR displayed the highest R 2 value of 0.86 (0.72) and the lowest MAE of 0.014 (0.10) µg m –3 . The GPR partial dependence analysis suggests that the relationships between predictors and MSA and SO 4 concentrations are complex rather than linear. Using the GPR algorithm, we produced a high-resolution daily dataset of In-situ Produced Biogenic MSA and SO 4 sea-level concentrations over the North Atlantic, which we named IPB-MSA&SO 4 . The obtained ...