Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies

Abstract We examine the use of an ensemble method, Random Forests, to delimit subspecies using mitochondrial DNA (mt DNA ) sequences. Diagnosability, a measure of the ability to correctly determine the taxon of a specimen of unknown origin, has historically been used to delimit subspecies, but few s...

Full description

Bibliographic Details
Published in:Marine Mammal Science
Main Authors: Archer, Frederick I., Martien, Karen K., Taylor, Barbara L.
Format: Article in Journal/Newspaper
Language:English
Published: Wiley 2017
Subjects:
Online Access:http://dx.doi.org/10.1111/mms.12414
https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Fmms.12414
https://onlinelibrary.wiley.com/doi/pdf/10.1111/mms.12414
id crwiley:10.1111/mms.12414
record_format openpolar
spelling crwiley:10.1111/mms.12414 2024-10-13T14:10:06+00:00 Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies Archer, Frederick I. Martien, Karen K. Taylor, Barbara L. 2017 http://dx.doi.org/10.1111/mms.12414 https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Fmms.12414 https://onlinelibrary.wiley.com/doi/pdf/10.1111/mms.12414 en eng Wiley http://onlinelibrary.wiley.com/termsAndConditions#vor Marine Mammal Science volume 33, issue S1, page 101-131 ISSN 0824-0469 1748-7692 journal-article 2017 crwiley https://doi.org/10.1111/mms.12414 2024-09-23T04:34:23Z Abstract We examine the use of an ensemble method, Random Forests, to delimit subspecies using mitochondrial DNA (mt DNA ) sequences. Diagnosability, a measure of the ability to correctly determine the taxon of a specimen of unknown origin, has historically been used to delimit subspecies, but few studies have explored how to estimate it from DNA sequences. Using simulated and empirical data sets, we demonstrate that Random Forests produces classification models that perform well for diagnosing subspecies and species. Populations with strong social structure and relatively low abundances ( e.g ., killer whales, Orcinus orca ) were found to be as diagnosable as species. Conversely, comparisons involving subspecies that are abundant ( e.g ., spinner and spotted dolphins, Stenella longirostris and S. attenuata ), are only as diagnosable as many population comparisons. Estimates of diagnosability reported in subspecies and species descriptions should include confidence intervals, which are influenced by the sample sizes of the training data. We also stress the importance of reporting the certainty with which individuals in the training data are classified in order to communicate the strength of the classification model and diagnosability estimate. Guidance as to ideal minimum diagnosability thresholds for subspecies will improve with more comprehensive analyses; however, values in the range of 80%–90% are considered appropriate. Article in Journal/Newspaper Orca Orcinus orca Wiley Online Library Marine Mammal Science 33 S1 101 131
institution Open Polar
collection Wiley Online Library
op_collection_id crwiley
language English
description Abstract We examine the use of an ensemble method, Random Forests, to delimit subspecies using mitochondrial DNA (mt DNA ) sequences. Diagnosability, a measure of the ability to correctly determine the taxon of a specimen of unknown origin, has historically been used to delimit subspecies, but few studies have explored how to estimate it from DNA sequences. Using simulated and empirical data sets, we demonstrate that Random Forests produces classification models that perform well for diagnosing subspecies and species. Populations with strong social structure and relatively low abundances ( e.g ., killer whales, Orcinus orca ) were found to be as diagnosable as species. Conversely, comparisons involving subspecies that are abundant ( e.g ., spinner and spotted dolphins, Stenella longirostris and S. attenuata ), are only as diagnosable as many population comparisons. Estimates of diagnosability reported in subspecies and species descriptions should include confidence intervals, which are influenced by the sample sizes of the training data. We also stress the importance of reporting the certainty with which individuals in the training data are classified in order to communicate the strength of the classification model and diagnosability estimate. Guidance as to ideal minimum diagnosability thresholds for subspecies will improve with more comprehensive analyses; however, values in the range of 80%–90% are considered appropriate.
format Article in Journal/Newspaper
author Archer, Frederick I.
Martien, Karen K.
Taylor, Barbara L.
spellingShingle Archer, Frederick I.
Martien, Karen K.
Taylor, Barbara L.
Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies
author_facet Archer, Frederick I.
Martien, Karen K.
Taylor, Barbara L.
author_sort Archer, Frederick I.
title Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies
title_short Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies
title_full Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies
title_fullStr Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies
title_full_unstemmed Diagnosability of mt DNA with Random Forests: Using sequence data to delimit subspecies
title_sort diagnosability of mt dna with random forests: using sequence data to delimit subspecies
publisher Wiley
publishDate 2017
url http://dx.doi.org/10.1111/mms.12414
https://api.wiley.com/onlinelibrary/tdm/v1/articles/10.1111%2Fmms.12414
https://onlinelibrary.wiley.com/doi/pdf/10.1111/mms.12414
genre Orca
Orcinus orca
genre_facet Orca
Orcinus orca
op_source Marine Mammal Science
volume 33, issue S1, page 101-131
ISSN 0824-0469 1748-7692
op_rights http://onlinelibrary.wiley.com/termsAndConditions#vor
op_doi https://doi.org/10.1111/mms.12414
container_title Marine Mammal Science
container_volume 33
container_issue S1
container_start_page 101
op_container_end_page 131
_version_ 1812817263188770816