Snow depth estimation and historical data reconstruction over China based on a random forest machine learning approach

We investigated the potential capability of the random forest (RF) machine learning (ML) model to estimate snow depth in this work. Four combinations composed of critical predictor variables were used to train the RF model. Then, we utilized three validation datasets from out-of-bag (OOB) samples, a...

Full description

Bibliographic Details
Published in:The Cryosphere
Main Authors: Yang, Jianwei, Jiang, Lingmei, Luojus, Kari, Pan, Jinmei, Lemmetyinen, Juha, Takala, Matias, Wu, Shengli
Format: Article in Journal/Newspaper
Language:English
Published: Copernicus Publications 2020
Subjects:
Online Access:https://doi.org/10.5194/tc-14-1763-2020
https://noa.gwlb.de/receive/cop_mods_00051674
https://noa.gwlb.de/servlets/MCRFileNodeServlet/cop_derivate_00051330/tc-14-1763-2020.pdf
https://tc.copernicus.org/articles/14/1763/2020/tc-14-1763-2020.pdf
Description
Summary:We investigated the potential capability of the random forest (RF) machine learning (ML) model to estimate snow depth in this work. Four combinations composed of critical predictor variables were used to train the RF model. Then, we utilized three validation datasets from out-of-bag (OOB) samples, a temporal subset, and a spatiotemporal subset to verify the fitted RF algorithms. The results indicated the following: (1) the accuracy of the RF model is greatly influenced by geographic location, elevation, and land cover fractions; (2) however, the redundant predictor variables (if highly correlated) slightly affect the RF model; and (3) the fitted RF algorithms perform better on temporal than spatial scales, with unbiased root-mean-square errors (RMSEs) of ∼4.4 and ∼7.3 cm, respectively. Finally, we used the fitted RF2 algorithm to retrieve a consistent 32-year daily snow depth dataset from 1987 to 2018. This product was evaluated against the independent station observations during the period 1987–2018. The mean unbiased RMSE and bias were 7.1 and −0.05 cm, respectively, indicating better performance than that of the former snow depth dataset (8.4 and −1.20 cm) from the Environmental and Ecological Science Data Center for West China (WESTDC). Although the RF product was superior to the WESTDC dataset, it still underestimated deep snow cover (>20 cm), with biases of −10.4, −8.9, and −34.1 cm for northeast China (NEC), northern Xinjiang (XJ), and the Qinghai–Tibetan Plateau (QTP), respectively. Additionally, the long-term snow depth datasets (station observations, RF estimates, and WESTDC product) were analyzed in terms of temporal and spatial variations over China. On a temporal scale, the ground truth snow depth presented a significant increasing trend from 1987 to 2018, especially in NEC. However, the RF and WESTDC products displayed no significant changing trends except on the QTP. The WESTDC product presented a significant decreasing trend on the QTP, with a correlation coefficient of −0.55, whereas there were no significant trends for ground truth observations and the RF product. For the spatial characteristics, similar trend patterns were observed for RF and WESTDC products over China. These characteristics presented significant decreasing trends in most areas and a significant increasing trend in central NEC.