SiDroForest: A comprehensive forest inventory of Siberian boreal forest investigations including drone-based point clouds, individually labelled trees, synthetically generated tree crowns and Sentinel-2 labelled image patches

This data collection is an attempt to remedy the scarcity of tree level forest structure data in the circum-boreal region, whilst providing, as part of the data collection, adjusted and labelled tree level and vegetation plot level data for machine learning and upscaling practices. Publicly availabl...

Full description

Bibliographic Details
Main Authors: Geffen, Femke, Heim, Birgit, Brieger, Frederic, Geng, Rongwei, Shevtsova, Iuliia A., Schulte, Luise, Stuenzi, Simone M., Bernhardt, Nadine, Troeva, Elena I., Pestryakova, Luidmila A., Zakharov, Evgenij S., Pflug, Bringfried, Herzschuh, Ulrike, Kruse, Stefan
Format: Text
Language:English
Published: 2021
Subjects:
Online Access:https://doi.org/10.5194/essd-2021-281
https://essd.copernicus.org/preprints/essd-2021-281/
id ftcopernicus:oai:publications.copernicus.org:essdd97012
record_format openpolar
institution Open Polar
collection Copernicus Publications: E-Journals
op_collection_id ftcopernicus
language English
description This data collection is an attempt to remedy the scarcity of tree level forest structure data in the circum-boreal region, whilst providing, as part of the data collection, adjusted and labelled tree level and vegetation plot level data for machine learning and upscaling practices. Publicly available comprehensive datasets on tree level forest structure are rare, due to the involvement of governmental agencies, public sectors, and private actors that all influence the availability of these datasets. We present datasets of vegetation composition and tree and plot level forest structure for two important vegetation transition zones in Siberia, Russia; the summergreen–evergreen transition zone in central Yakutia and the tundra–taiga transition zone in Chukotka (NE Siberia). The SiDroForest collection contains a variety of data mainly based on unmanned aerial vehicle (UAV) and field data collected from 64 vegetation plots during fieldwork jointly performed by the Alfred Wegener Institute for Polar and Marine Research (AWI) and the North-Eastern Federal University of Yakutsk (NEFU) during the Chukotka 2018 expedition to Siberia. The data collection consists of four separate datasets. The fieldwork locations are the anchors that bind the data types together based on the location of the vegetation plot. i) The first dataset (Kruse et al., 2021, https://doi.pangaea.de/10.1594/PANGAEA.933263 ) provides UAV-borne data products covering the 64 vegetation plots surveyed during fieldwork: including structure from motion (SfM) point clouds, point-cloud products such as Digital Elevation Model (DEM), Canopy Height Model (CHM), Digital Surface Model (DSM) and Digital Terrain Model (DTM) constructed from Red Green Blue (RGB) and Red Green Near Infrared (RGN) orthomosaics. Forest structure and vegetation composition data are crucial in the assessment of whether a forest is to act as a carbon sink under changing climate conditions. Fieldwork and UAV-products can provide such data in depth. ii) The second dataset contains spatial data in the form of points and polygon shape files of 872 labelled individual trees and shrubs that were recorded during fieldwork at the same vegetation plots with information on tree height, crown diameter, and species (van Geffen et al., 2021c, https://doi.pangaea.de/10.1594/PANGAEA.932821 ). These tree- and shrub-individual labelled point and polygon shape files were generated and are located on the UAV RGB orthoimages. The individual number links to the information collected during the expedition such as tree height, crown diameter and vitality provided in table format. This dataset can be used to link individual trees in the SfM point clouds, providing unique insights into the vegetation composition and also allows future monitoring of the individual trees and the contents of the recorded vegetation plots at large. iii) The third dataset contains a synthesis of 10 000 generated images and masks that have the tree crowns of two species of larch ( Larix gmelinii and Larix cajanderi ) automatically extracted from the RGB UAV images in the common objects in context (COCO) format (van Geffen et al., 2021a, https://doi.pangaea.de/10.1594/PANGAEA.932795 ). The synthetic dataset was created specifically to detect Siberian larch species. iv) If publicly available forest-structure datasets at tree level are rarely available for Siberia, even fewer ready-to-use tree and plot level data are available for machine learning approaches, for example optimised data formats containing annotated vegetation categories. The fourth set contains Sentinel-2 Level-2 bottom of atmosphere labelled image patches with seasonal information and annotated vegetation categories covering the vegetation plots (van Geffen et al., 2021b, https://doi.pangaea.de/10.1594/PANGAEA.933268 ). The dataset is created with the aim of providing a small ready-to use validation and training data set to be used in various vegetation-related machine-learning tasks. The SidroForest data collection serves a variety of user communities. First of all, the UAV-derived top of canopy structure information, orthomosaics and the detailed vegetation information in the labelled data set provide detailed information on forest type, structure and composition for scientific communities with ecological and biological applications. The detailed Land Cover and Vegetation structure information in the first two data sets are of use for the generation and validation of Land Cover remote sensing products in radar and optical remote sensing. In addition to providing information on forest structure and vegetation composition of the vegetation plots, parts of the SiDroForest dataset are prepared to be used as training and validation data for machine learning purposes. For example, the Synthetic tree crown dataset is generated from the raw UAV images and optimized to be used in neural networks. Furthermore, the fourth SiDroForest data set contains standardized Sentinel-2 labelled image patches that provide training data on vegetation class categories for machine learning classification with JSON labels provided. The SiDroForst data collective serves as a basis to add future data collected during expeditions performed by the Alfred Wegener Institute, creating a larger dataset in the upcoming years that can provide unique insights into remote hard to reach boreal regions of Siberia.
format Text
author Geffen, Femke
Heim, Birgit
Brieger, Frederic
Geng, Rongwei
Shevtsova, Iuliia A.
Schulte, Luise
Stuenzi, Simone M.
Bernhardt, Nadine
Troeva, Elena I.
Pestryakova, Luidmila A.
Zakharov, Evgenij S.
Pflug, Bringfried
Herzschuh, Ulrike
Kruse, Stefan
spellingShingle Geffen, Femke
Heim, Birgit
Brieger, Frederic
Geng, Rongwei
Shevtsova, Iuliia A.
Schulte, Luise
Stuenzi, Simone M.
Bernhardt, Nadine
Troeva, Elena I.
Pestryakova, Luidmila A.
Zakharov, Evgenij S.
Pflug, Bringfried
Herzschuh, Ulrike
Kruse, Stefan
SiDroForest: A comprehensive forest inventory of Siberian boreal forest investigations including drone-based point clouds, individually labelled trees, synthetically generated tree crowns and Sentinel-2 labelled image patches
author_facet Geffen, Femke
Heim, Birgit
Brieger, Frederic
Geng, Rongwei
Shevtsova, Iuliia A.
Schulte, Luise
Stuenzi, Simone M.
Bernhardt, Nadine
Troeva, Elena I.
Pestryakova, Luidmila A.
Zakharov, Evgenij S.
Pflug, Bringfried
Herzschuh, Ulrike
Kruse, Stefan
author_sort Geffen, Femke
title SiDroForest: A comprehensive forest inventory of Siberian boreal forest investigations including drone-based point clouds, individually labelled trees, synthetically generated tree crowns and Sentinel-2 labelled image patches
title_short SiDroForest: A comprehensive forest inventory of Siberian boreal forest investigations including drone-based point clouds, individually labelled trees, synthetically generated tree crowns and Sentinel-2 labelled image patches
title_full SiDroForest: A comprehensive forest inventory of Siberian boreal forest investigations including drone-based point clouds, individually labelled trees, synthetically generated tree crowns and Sentinel-2 labelled image patches
title_fullStr SiDroForest: A comprehensive forest inventory of Siberian boreal forest investigations including drone-based point clouds, individually labelled trees, synthetically generated tree crowns and Sentinel-2 labelled image patches
title_full_unstemmed SiDroForest: A comprehensive forest inventory of Siberian boreal forest investigations including drone-based point clouds, individually labelled trees, synthetically generated tree crowns and Sentinel-2 labelled image patches
title_sort sidroforest: a comprehensive forest inventory of siberian boreal forest investigations including drone-based point clouds, individually labelled trees, synthetically generated tree crowns and sentinel-2 labelled image patches
publishDate 2021
url https://doi.org/10.5194/essd-2021-281
https://essd.copernicus.org/preprints/essd-2021-281/
geographic Yakutsk
geographic_facet Yakutsk
genre Alfred Wegener Institute
Chukotka
taiga
Tundra
Yakutia
Yakutsk
Siberia
genre_facet Alfred Wegener Institute
Chukotka
taiga
Tundra
Yakutia
Yakutsk
Siberia
op_source eISSN: 1866-3516
op_relation doi:10.5194/essd-2021-281
https://essd.copernicus.org/preprints/essd-2021-281/
op_doi https://doi.org/10.5194/essd-2021-281
_version_ 1766271825801641984
spelling ftcopernicus:oai:publications.copernicus.org:essdd97012 2023-05-15T13:15:55+02:00 SiDroForest: A comprehensive forest inventory of Siberian boreal forest investigations including drone-based point clouds, individually labelled trees, synthetically generated tree crowns and Sentinel-2 labelled image patches Geffen, Femke Heim, Birgit Brieger, Frederic Geng, Rongwei Shevtsova, Iuliia A. Schulte, Luise Stuenzi, Simone M. Bernhardt, Nadine Troeva, Elena I. Pestryakova, Luidmila A. Zakharov, Evgenij S. Pflug, Bringfried Herzschuh, Ulrike Kruse, Stefan 2021-11-23 application/pdf https://doi.org/10.5194/essd-2021-281 https://essd.copernicus.org/preprints/essd-2021-281/ eng eng doi:10.5194/essd-2021-281 https://essd.copernicus.org/preprints/essd-2021-281/ eISSN: 1866-3516 Text 2021 ftcopernicus https://doi.org/10.5194/essd-2021-281 2021-11-29T17:22:29Z This data collection is an attempt to remedy the scarcity of tree level forest structure data in the circum-boreal region, whilst providing, as part of the data collection, adjusted and labelled tree level and vegetation plot level data for machine learning and upscaling practices. Publicly available comprehensive datasets on tree level forest structure are rare, due to the involvement of governmental agencies, public sectors, and private actors that all influence the availability of these datasets. We present datasets of vegetation composition and tree and plot level forest structure for two important vegetation transition zones in Siberia, Russia; the summergreen–evergreen transition zone in central Yakutia and the tundra–taiga transition zone in Chukotka (NE Siberia). The SiDroForest collection contains a variety of data mainly based on unmanned aerial vehicle (UAV) and field data collected from 64 vegetation plots during fieldwork jointly performed by the Alfred Wegener Institute for Polar and Marine Research (AWI) and the North-Eastern Federal University of Yakutsk (NEFU) during the Chukotka 2018 expedition to Siberia. The data collection consists of four separate datasets. The fieldwork locations are the anchors that bind the data types together based on the location of the vegetation plot. i) The first dataset (Kruse et al., 2021, https://doi.pangaea.de/10.1594/PANGAEA.933263 ) provides UAV-borne data products covering the 64 vegetation plots surveyed during fieldwork: including structure from motion (SfM) point clouds, point-cloud products such as Digital Elevation Model (DEM), Canopy Height Model (CHM), Digital Surface Model (DSM) and Digital Terrain Model (DTM) constructed from Red Green Blue (RGB) and Red Green Near Infrared (RGN) orthomosaics. Forest structure and vegetation composition data are crucial in the assessment of whether a forest is to act as a carbon sink under changing climate conditions. Fieldwork and UAV-products can provide such data in depth. ii) The second dataset contains spatial data in the form of points and polygon shape files of 872 labelled individual trees and shrubs that were recorded during fieldwork at the same vegetation plots with information on tree height, crown diameter, and species (van Geffen et al., 2021c, https://doi.pangaea.de/10.1594/PANGAEA.932821 ). These tree- and shrub-individual labelled point and polygon shape files were generated and are located on the UAV RGB orthoimages. The individual number links to the information collected during the expedition such as tree height, crown diameter and vitality provided in table format. This dataset can be used to link individual trees in the SfM point clouds, providing unique insights into the vegetation composition and also allows future monitoring of the individual trees and the contents of the recorded vegetation plots at large. iii) The third dataset contains a synthesis of 10 000 generated images and masks that have the tree crowns of two species of larch ( Larix gmelinii and Larix cajanderi ) automatically extracted from the RGB UAV images in the common objects in context (COCO) format (van Geffen et al., 2021a, https://doi.pangaea.de/10.1594/PANGAEA.932795 ). The synthetic dataset was created specifically to detect Siberian larch species. iv) If publicly available forest-structure datasets at tree level are rarely available for Siberia, even fewer ready-to-use tree and plot level data are available for machine learning approaches, for example optimised data formats containing annotated vegetation categories. The fourth set contains Sentinel-2 Level-2 bottom of atmosphere labelled image patches with seasonal information and annotated vegetation categories covering the vegetation plots (van Geffen et al., 2021b, https://doi.pangaea.de/10.1594/PANGAEA.933268 ). The dataset is created with the aim of providing a small ready-to use validation and training data set to be used in various vegetation-related machine-learning tasks. The SidroForest data collection serves a variety of user communities. First of all, the UAV-derived top of canopy structure information, orthomosaics and the detailed vegetation information in the labelled data set provide detailed information on forest type, structure and composition for scientific communities with ecological and biological applications. The detailed Land Cover and Vegetation structure information in the first two data sets are of use for the generation and validation of Land Cover remote sensing products in radar and optical remote sensing. In addition to providing information on forest structure and vegetation composition of the vegetation plots, parts of the SiDroForest dataset are prepared to be used as training and validation data for machine learning purposes. For example, the Synthetic tree crown dataset is generated from the raw UAV images and optimized to be used in neural networks. Furthermore, the fourth SiDroForest data set contains standardized Sentinel-2 labelled image patches that provide training data on vegetation class categories for machine learning classification with JSON labels provided. The SiDroForst data collective serves as a basis to add future data collected during expeditions performed by the Alfred Wegener Institute, creating a larger dataset in the upcoming years that can provide unique insights into remote hard to reach boreal regions of Siberia. Text Alfred Wegener Institute Chukotka taiga Tundra Yakutia Yakutsk Siberia Copernicus Publications: E-Journals Yakutsk