Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies
Abstract We examine the use of an ensemble method, Random Forests, to delimit subspecies using mitochondrial DNA (mt DNA ) sequences. Diagnosability, a measure of the ability to correctly determine the taxon of a specimen of unknown origin, has historically been used to delimit subspecies, but few s...
Published in: | Marine Mammal Science |
---|---|
Main Authors: | , , |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
Wiley
2017
|
Subjects: | |
Online Access: | http://dx.doi.org/10.1111/mms.12414 https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Fmms.12414 https://onlinelibrary.wiley.com/doi/pdf/10.1111/mms.12414 |
id |
crwiley:10.1111/mms.12414 |
---|---|
record_format |
openpolar |
spelling |
crwiley:10.1111/mms.12414 2024-10-13T14:10:06+00:00 Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies Archer, Frederick I. Martien, Karen K. Taylor, Barbara L. 2017 http://dx.doi.org/10.1111/mms.12414 https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Fmms.12414 https://onlinelibrary.wiley.com/doi/pdf/10.1111/mms.12414 en eng Wiley http://onlinelibrary.wiley.com/termsAndConditions#vor Marine Mammal Science volume 33, issue S1, page 101-131 ISSN 0824-0469 1748-7692 journal-article 2017 crwiley https://doi.org/10.1111/mms.12414 2024-09-23T04:34:23Z Abstract We examine the use of an ensemble method, Random Forests, to delimit subspecies using mitochondrial DNA (mt DNA ) sequences. Diagnosability, a measure of the ability to correctly determine the taxon of a specimen of unknown origin, has historically been used to delimit subspecies, but few studies have explored how to estimate it from DNA sequences. Using simulated and empirical data sets, we demonstrate that Random Forests produces classification models that perform well for diagnosing subspecies and species. Populations with strong social structure and relatively low abundances ( e.g ., killer whales, Orcinus orca ) were found to be as diagnosable as species. Conversely, comparisons involving subspecies that are abundant ( e.g ., spinner and spotted dolphins, Stenella longirostris and S. attenuata ), are only as diagnosable as many population comparisons. Estimates of diagnosability reported in subspecies and species descriptions should include confidence intervals, which are influenced by the sample sizes of the training data. We also stress the importance of reporting the certainty with which individuals in the training data are classified in order to communicate the strength of the classification model and diagnosability estimate. Guidance as to ideal minimum diagnosability thresholds for subspecies will improve with more comprehensive analyses; however, values in the range of 80%–90% are considered appropriate. Article in Journal/Newspaper Orca Orcinus orca Wiley Online Library Marine Mammal Science 33 S1 101 131 |
institution |
Open Polar |
collection |
Wiley Online Library |
op_collection_id |
crwiley |
language |
English |
description |
Abstract We examine the use of an ensemble method, Random Forests, to delimit subspecies using mitochondrial DNA (mt DNA ) sequences. Diagnosability, a measure of the ability to correctly determine the taxon of a specimen of unknown origin, has historically been used to delimit subspecies, but few studies have explored how to estimate it from DNA sequences. Using simulated and empirical data sets, we demonstrate that Random Forests produces classification models that perform well for diagnosing subspecies and species. Populations with strong social structure and relatively low abundances ( e.g ., killer whales, Orcinus orca ) were found to be as diagnosable as species. Conversely, comparisons involving subspecies that are abundant ( e.g ., spinner and spotted dolphins, Stenella longirostris and S. attenuata ), are only as diagnosable as many population comparisons. Estimates of diagnosability reported in subspecies and species descriptions should include confidence intervals, which are influenced by the sample sizes of the training data. We also stress the importance of reporting the certainty with which individuals in the training data are classified in order to communicate the strength of the classification model and diagnosability estimate. Guidance as to ideal minimum diagnosability thresholds for subspecies will improve with more comprehensive analyses; however, values in the range of 80%–90% are considered appropriate. |
format |
Article in Journal/Newspaper |
author |
Archer, Frederick I. Martien, Karen K. Taylor, Barbara L. |
spellingShingle |
Archer, Frederick I. Martien, Karen K. Taylor, Barbara L. Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies |
author_facet |
Archer, Frederick I. Martien, Karen K. Taylor, Barbara L. |
author_sort |
Archer, Frederick I. |
title |
Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies |
title_short |
Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies |
title_full |
Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies |
title_fullStr |
Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies |
title_full_unstemmed |
Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies |
title_sort |
diagnosability of mt dna with random forests: using sequence data to delimit subspecies |
publisher |
Wiley |
publishDate |
2017 |
url |
http://dx.doi.org/10.1111/mms.12414 https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Fmms.12414 https://onlinelibrary.wiley.com/doi/pdf/10.1111/mms.12414 |
genre |
Orca Orcinus orca |
genre_facet |
Orca Orcinus orca |
op_source |
Marine Mammal Science volume 33, issue S1, page 101-131 ISSN 0824-0469 1748-7692 |
op_rights |
http://onlinelibrary.wiley.com/termsAndConditions#vor |
op_doi |
https://doi.org/10.1111/mms.12414 |
container_title |
Marine Mammal Science |
container_volume |
33 |
container_issue |
S1 |
container_start_page |
101 |
op_container_end_page |
131 |
_version_ |
1812817263188770816 |