Machine Learning Applied to Reach Classification in a Northern Sweden Catchment

An accurate fine resolution classification of river systems positively impacts the process of assessment and monitoring of water courses, as stressed by the European Commission’s Water Framework Directive. Being able to attribute classes using remotely obtained data can be advantageous to perform ex...

Full description

Bibliographic Details
Main Author: dos Santos Toledo Busarello, Mariana
Format: Bachelor Thesis
Language:English
Published: Umeå universitet, Institutionen för ekologi, miljö och geovetenskap 2021
Subjects:
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184140
id ftumeauniv:oai:DiVA.org:umu-184140
record_format openpolar
spelling ftumeauniv:oai:DiVA.org:umu-184140 2023-10-09T21:54:38+02:00 Machine Learning Applied to Reach Classification in a Northern Sweden Catchment dos Santos Toledo Busarello, Mariana 2021 application/pdf http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184140 eng eng Umeå universitet, Institutionen för ekologi, miljö och geovetenskap http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184140 info:eu-repo/semantics/openAccess machine learning geomorphology random forest channel type Computer Sciences Datavetenskap (datalogi) Natural Sciences Naturvetenskap Oceanography Hydrology and Water Resources Oceanografi hydrologi och vattenresurser Computer and Information Sciences Data- och informationsvetenskap Student thesis info:eu-repo/semantics/bachelorThesis text 2021 ftumeauniv 2023-09-22T13:53:58Z An accurate fine resolution classification of river systems positively impacts the process of assessment and monitoring of water courses, as stressed by the European Commission’s Water Framework Directive. Being able to attribute classes using remotely obtained data can be advantageous to perform extensive classification of reaches without the use of field work, with some methods also allowing to identify which features best described each of the process domains. In this work, the data from two Swedish sub-catchments above the highest coastline was used to train a Random Forest Classifier, a Machine Learning algorithm. The obtained model provided predictions of classifications and analyses of the most important features. Each study area was studied separately, then combined. In the combined case, the analysis was made with and without lakes in the data, to verify how it would affect the predictions. The results showed that the accuracy of the estimator was reliable, however, due to data complexity and imbalance, rapids were harder to be classify accurately, with an overprediction of the slow-flowing class. Combining the datasets and having the presence of lakes lessened the shortcomings of the data imbalance. Using the feature importance and permutation importance methods, the three most important features identified were the channel slope, the median of the roughness in the 100-m buffer, and the standard deviation of the planform curvature in the 100-m buffer. This finding was supported by previous studies, but other variables expected to have a high participation such as lithology and valley confinement were not relevant, which most likely relates to the coarseness of the available data. The most frequent errors were also placed in maps, showing there was some overlap of error hotspots and areas previously restored in 2010. Bachelor Thesis Northern Sweden Umeå University: Publications (DiVA)
institution Open Polar
collection Umeå University: Publications (DiVA)
op_collection_id ftumeauniv
language English
topic machine learning
geomorphology
random forest
channel type
Computer Sciences
Datavetenskap (datalogi)
Natural Sciences
Naturvetenskap
Oceanography
Hydrology and Water Resources
Oceanografi
hydrologi och vattenresurser
Computer and Information Sciences
Data- och informationsvetenskap
spellingShingle machine learning
geomorphology
random forest
channel type
Computer Sciences
Datavetenskap (datalogi)
Natural Sciences
Naturvetenskap
Oceanography
Hydrology and Water Resources
Oceanografi
hydrologi och vattenresurser
Computer and Information Sciences
Data- och informationsvetenskap
dos Santos Toledo Busarello, Mariana
Machine Learning Applied to Reach Classification in a Northern Sweden Catchment
topic_facet machine learning
geomorphology
random forest
channel type
Computer Sciences
Datavetenskap (datalogi)
Natural Sciences
Naturvetenskap
Oceanography
Hydrology and Water Resources
Oceanografi
hydrologi och vattenresurser
Computer and Information Sciences
Data- och informationsvetenskap
description An accurate fine resolution classification of river systems positively impacts the process of assessment and monitoring of water courses, as stressed by the European Commission’s Water Framework Directive. Being able to attribute classes using remotely obtained data can be advantageous to perform extensive classification of reaches without the use of field work, with some methods also allowing to identify which features best described each of the process domains. In this work, the data from two Swedish sub-catchments above the highest coastline was used to train a Random Forest Classifier, a Machine Learning algorithm. The obtained model provided predictions of classifications and analyses of the most important features. Each study area was studied separately, then combined. In the combined case, the analysis was made with and without lakes in the data, to verify how it would affect the predictions. The results showed that the accuracy of the estimator was reliable, however, due to data complexity and imbalance, rapids were harder to be classify accurately, with an overprediction of the slow-flowing class. Combining the datasets and having the presence of lakes lessened the shortcomings of the data imbalance. Using the feature importance and permutation importance methods, the three most important features identified were the channel slope, the median of the roughness in the 100-m buffer, and the standard deviation of the planform curvature in the 100-m buffer. This finding was supported by previous studies, but other variables expected to have a high participation such as lithology and valley confinement were not relevant, which most likely relates to the coarseness of the available data. The most frequent errors were also placed in maps, showing there was some overlap of error hotspots and areas previously restored in 2010.
format Bachelor Thesis
author dos Santos Toledo Busarello, Mariana
author_facet dos Santos Toledo Busarello, Mariana
author_sort dos Santos Toledo Busarello, Mariana
title Machine Learning Applied to Reach Classification in a Northern Sweden Catchment
title_short Machine Learning Applied to Reach Classification in a Northern Sweden Catchment
title_full Machine Learning Applied to Reach Classification in a Northern Sweden Catchment
title_fullStr Machine Learning Applied to Reach Classification in a Northern Sweden Catchment
title_full_unstemmed Machine Learning Applied to Reach Classification in a Northern Sweden Catchment
title_sort machine learning applied to reach classification in a northern sweden catchment
publisher Umeå universitet, Institutionen för ekologi, miljö och geovetenskap
publishDate 2021
url http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184140
genre Northern Sweden
genre_facet Northern Sweden
op_relation http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-184140
op_rights info:eu-repo/semantics/openAccess
_version_ 1779318284731023360