UDASH-AI: Unified Database for Arctic and Subarctic Hydrography Optimized for Artificial Intelligence Applications

UDASH-AI represents an updated version of the UDASH dataset, that has been created to develop an artificial intelligence algorithm, that we name SalaciaML-Arctic to support the visual/human quality control of the data. UDASH-AI can be directly used with our algorithm, provided under the DOI: https:/...

Full description

Bibliographic Details
Main Authors: Chouai, Mohamed, Mieruch-Schnülle, Sebastian, Behrendt, Axel, Vredenborg, Myriel, Rabe, Benjamin
Format: Dataset
Language:English
Published: PANGAEA 2024
Subjects:
Online Access:https://doi.pangaea.de/10.1594/PANGAEA.973235
https://doi.org/10.1594/PANGAEA.973235
Description
Summary:UDASH-AI represents an updated version of the UDASH dataset, that has been created to develop an artificial intelligence algorithm, that we name SalaciaML-Arctic to support the visual/human quality control of the data. UDASH-AI can be directly used with our algorithm, provided under the DOI: https://doi.org/10.5281/zenodo.11535790 and the respective GitHub repository, to reproduce our results, extend the methods and more. Additionally, we have implemented SalaciaML-Arctic as an user-friendly app at https://mvre.autoqc.cloud.awi.de. Following steps have been applied on the original UDASH dataset to create UDASH-AI: • Concatenation of the single, annual txt files into one single csv file. • The original encoding of missing time and day information in the date/time string as 'T99:99' and '-00T' have been changed to ISO8601 conformity: 'T00:00' and '-01T'. To not loose this information we have added a quality flag ('QF_time') in the column next to the date/time with following encoding: ◦ 0: No missing data (good quality). ◦ 1: Missing day. ◦ 2: Missing time. ◦ 3: Missing day and time. • Further we have included the two temperature gradients and the density gradient described in the original UDASH paper as extra columns: ◦ Depth over temperature gradient, denoted as 'd/d_Temp_Depth_[m_°C^-1]'. ◦ Temperature over depth gradient, denoted as 'd/d_Depth_Temp_[°C_m^-1]'. ◦ Density gradient as 'd/d_Depth_Dens_[kg_m^-4]'. • The missing value is marked with an indicator: NaN. • We added the quality flags for temperature from the classical/traditional UDASH automatic checks: outlier and spike (flag=4), density inversion (flag=3) and suspect gradient (flag=2) as an extra column named 'QF_trad'.