Data mining-based machine learning methods for improving hydrological data a case study of salinity field in the Western Arctic Ocean

In the Western Arctic Ocean lies the largest freshwater reservoir in the Arctic Ocean, the Beaufort Gyre. Long-term changes in freshwater reservoirs are critical for understanding the Arctic Ocean, and data from various sources, particularly measured or reanalyzed data, must be used to the greatest...

Full description

Bibliographic Details
Main Authors: Tao, Shuhao, Du, Ling, Li, Jiahao
Format: Text
Language:English
Published: 2024
Subjects:
Online Access:https://doi.org/10.5194/essd-2024-138
https://essd.copernicus.org/preprints/essd-2024-138/
Description
Summary:In the Western Arctic Ocean lies the largest freshwater reservoir in the Arctic Ocean, the Beaufort Gyre. Long-term changes in freshwater reservoirs are critical for understanding the Arctic Ocean, and data from various sources, particularly measured or reanalyzed data, must be used to the greatest extent possible. Over the past two decades, a large number of intensive field observations and ship surveys have been conducted in the western Arctic Ocean to obtain a large amount of CTD data. Multiple machine learning methods were evaluated and merged to reconstruct annual salinity product in the western Arctic Ocean over the period 2003–2022. Data mining-based machine learning methods make use of variables determined by physical processes, such as sea level pressure, sea ice concentration, and drift. Our objective is to effectively manage the mean root mean square error (RMSE) of sea surface salinity, which exhibits greater susceptibility to atmospheric, sea ice, and oceanic changes. Considering the higher susceptibility of sea surface salinity to atmospheric, sea ice, and oceanic changes, which leads to greater variability, we ensured that the average root mean square error of CTD and EN4 sea surface salinity field during the machine learning training process was constrained within 0.25 psu. The machine learning process reveals that the uncertainty in predicting sea surface salinity, as constrained by CTD data, is 0.24 %, whereas when constrained by EN4 data it reduces to 0.02 %. During data merging and post-calibrating, the weight coefficients are constrained by imposing limitations on the uncertainty value. Compared with commonly used EN4 and ORAS5 salinity in the Arctic Ocean, our salinity product provide more accurate descriptions of freshwater content in the Beaufort Gyre and depth variations at its halocline base. The application potential of this multi-machine learning results approach for evaluating and integrating extends beyond the salinity field, encompassing hydrometeorology, sea ice ...