Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem
The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree sp...
Published in: | Algorithms for Molecular Biology |
---|---|
Main Authors: | , , , |
Format: | Text |
Language: | English |
Published: |
BioMed Central
2024
|
Subjects: | |
Online Access: | http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10775561/ http://www.ncbi.nlm.nih.gov/pubmed/38191515 https://doi.org/10.1186/s13015-023-00249-9 |
id |
ftpubmed:oai:pubmedcentral.nih.gov:10775561 |
---|---|
record_format |
openpolar |
spelling |
ftpubmed:oai:pubmedcentral.nih.gov:10775561 2024-02-11T10:09:09+01:00 Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem Dai, Junyan Rubel, Tobias Han, Yunheng Molloy, Erin K. 2024-01-08 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10775561/ http://www.ncbi.nlm.nih.gov/pubmed/38191515 https://doi.org/10.1186/s13015-023-00249-9 en eng BioMed Central http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10775561/ http://www.ncbi.nlm.nih.gov/pubmed/38191515 http://dx.doi.org/10.1186/s13015-023-00249-9 © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Algorithms Mol Biol Research Text 2024 ftpubmed https://doi.org/10.1186/s13015-023-00249-9 2024-01-14T01:59:31Z The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree space. The popular species tree estimation method ASTRAL, for example, returns a tree that (1) maximizes the quartet score computed with respect to the input gene trees and that (2) draws its branches (bipartitions) from the input constraint set. This technique has yet to be used for parsimony problems where the input are binary characters, sometimes with missing values. Here, we introduce the clade-constrained character parsimony problem and present an algorithm that solves this problem for the Dollo criterion score in [Formula: see text] time, where n is the number of leaves, k is the number of characters, and [Formula: see text] is the set of clades used as constraints. Dollo parsimony, which requires traits/mutations to be gained at most once but allows them to be lost any number of times, is widely used for tumor phylogenetics as well as species phylogenetics, for example analyses of low-homoplasy retroelement insertions across the vertebrate tree of life. This motivated us to implement our algorithm in a software package, called Dollo-CDP, and evaluate its utility for analyzing retroelement insertion presence / absence patterns for bats, birds, toothed whales as well as simulated data. Our results show that Dollo-CDP can improve upon heuristic search from a single starting tree, often recovering a better scoring tree. Moreover, Dollo-CDP scales to data sets with much larger numbers of taxa than branch-and-bound while still having an optimality guarantee, albeit a more restricted one. Lastly, we show that our algorithm for Dollo parsimony can easily be adapted to Camin-Sokal parsimony but not Fitch parsimony. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at ... Text toothed whales PubMed Central (PMC) Algorithms for Molecular Biology 19 1 |
institution |
Open Polar |
collection |
PubMed Central (PMC) |
op_collection_id |
ftpubmed |
language |
English |
topic |
Research |
spellingShingle |
Research Dai, Junyan Rubel, Tobias Han, Yunheng Molloy, Erin K. Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem |
topic_facet |
Research |
description |
The last decade of phylogenetics has seen the development of many methods that leverage constraints plus dynamic programming. The goal of this algorithmic technique is to produce a phylogeny that is optimal with respect to some objective function and that lies within a constrained version of tree space. The popular species tree estimation method ASTRAL, for example, returns a tree that (1) maximizes the quartet score computed with respect to the input gene trees and that (2) draws its branches (bipartitions) from the input constraint set. This technique has yet to be used for parsimony problems where the input are binary characters, sometimes with missing values. Here, we introduce the clade-constrained character parsimony problem and present an algorithm that solves this problem for the Dollo criterion score in [Formula: see text] time, where n is the number of leaves, k is the number of characters, and [Formula: see text] is the set of clades used as constraints. Dollo parsimony, which requires traits/mutations to be gained at most once but allows them to be lost any number of times, is widely used for tumor phylogenetics as well as species phylogenetics, for example analyses of low-homoplasy retroelement insertions across the vertebrate tree of life. This motivated us to implement our algorithm in a software package, called Dollo-CDP, and evaluate its utility for analyzing retroelement insertion presence / absence patterns for bats, birds, toothed whales as well as simulated data. Our results show that Dollo-CDP can improve upon heuristic search from a single starting tree, often recovering a better scoring tree. Moreover, Dollo-CDP scales to data sets with much larger numbers of taxa than branch-and-bound while still having an optimality guarantee, albeit a more restricted one. Lastly, we show that our algorithm for Dollo parsimony can easily be adapted to Camin-Sokal parsimony but not Fitch parsimony. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at ... |
format |
Text |
author |
Dai, Junyan Rubel, Tobias Han, Yunheng Molloy, Erin K. |
author_facet |
Dai, Junyan Rubel, Tobias Han, Yunheng Molloy, Erin K. |
author_sort |
Dai, Junyan |
title |
Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem |
title_short |
Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem |
title_full |
Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem |
title_fullStr |
Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem |
title_full_unstemmed |
Dollo-CDP: a polynomial-time algorithm for the clade-constrained large Dollo parsimony problem |
title_sort |
dollo-cdp: a polynomial-time algorithm for the clade-constrained large dollo parsimony problem |
publisher |
BioMed Central |
publishDate |
2024 |
url |
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10775561/ http://www.ncbi.nlm.nih.gov/pubmed/38191515 https://doi.org/10.1186/s13015-023-00249-9 |
genre |
toothed whales |
genre_facet |
toothed whales |
op_source |
Algorithms Mol Biol |
op_relation |
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10775561/ http://www.ncbi.nlm.nih.gov/pubmed/38191515 http://dx.doi.org/10.1186/s13015-023-00249-9 |
op_rights |
© The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
op_doi |
https://doi.org/10.1186/s13015-023-00249-9 |
container_title |
Algorithms for Molecular Biology |
container_volume |
19 |
container_issue |
1 |
_version_ |
1790608920796987392 |