Empirical evaluation of a prior for Bayesian phylogenetic inference

The Bayesian method of phylogenetic inference often produces high posterior probabilities (PPs) for trees or clades, even when the trees are clearly incorrect. The problem appears to be mainly due to large sizes of molecular datasets and to the large-sample properties of Bayesian model selection and...

Full description

Bibliographic Details
Published in:Philosophical Transactions of the Royal Society B: Biological Sciences
Main Author: Yang, Ziheng
Format: Text
Language:English
Published: The Royal Society 2008
Subjects:
Online Access:http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2607411
http://www.ncbi.nlm.nih.gov/pubmed/18852106
https://doi.org/10.1098/rstb.2008.0164
id ftpubmed:oai:pubmedcentral.nih.gov:2607411
record_format openpolar
spelling ftpubmed:oai:pubmedcentral.nih.gov:2607411 2023-05-15T15:37:13+02:00 Empirical evaluation of a prior for Bayesian phylogenetic inference Yang, Ziheng 2008-10-07 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2607411 http://www.ncbi.nlm.nih.gov/pubmed/18852106 https://doi.org/10.1098/rstb.2008.0164 en eng The Royal Society http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2607411 http://www.ncbi.nlm.nih.gov/pubmed/18852106 http://dx.doi.org/10.1098/rstb.2008.0164 © 2008 The Royal Society Research Article Text 2008 ftpubmed https://doi.org/10.1098/rstb.2008.0164 2013-09-02T09:02:59Z The Bayesian method of phylogenetic inference often produces high posterior probabilities (PPs) for trees or clades, even when the trees are clearly incorrect. The problem appears to be mainly due to large sizes of molecular datasets and to the large-sample properties of Bayesian model selection and its sensitivity to the prior when several of the models under comparison are nearly equally correct (or nearly equally wrong) and are of the same dimension. A previous suggestion to alleviate the problem is to let the internal branch lengths in the tree become increasingly small in the prior with the increase in the data size so that the bifurcating trees are increasingly star-like. In particular, if the internal branch lengths are assigned the exponential prior, the prior mean μ0 should approach zero faster than 1/n but more slowly than 1/n, where n is the sequence length. This paper examines the usefulness of this data size-dependent prior using a dataset of the mitochondrial protein-coding genes from the baleen whales, with the prior mean fixed at μ0=0.1n−2/3. In this dataset, phylogeny reconstruction is sensitive to the assumed evolutionary model, species sampling and the type of data (DNA or protein sequences), but Bayesian inference using the default prior attaches high PPs for conflicting phylogenetic relationships. The data size-dependent prior alleviates the problem to some extent, giving weaker support for unstable relationships. This prior may be useful in reducing apparent conflicts in the results of Bayesian analysis or in making the method less sensitive to model violations. Text baleen whales PubMed Central (PMC) Philosophical Transactions of the Royal Society B: Biological Sciences 363 1512 4031 4039
institution Open Polar
collection PubMed Central (PMC)
op_collection_id ftpubmed
language English
topic Research Article
spellingShingle Research Article
Yang, Ziheng
Empirical evaluation of a prior for Bayesian phylogenetic inference
topic_facet Research Article
description The Bayesian method of phylogenetic inference often produces high posterior probabilities (PPs) for trees or clades, even when the trees are clearly incorrect. The problem appears to be mainly due to large sizes of molecular datasets and to the large-sample properties of Bayesian model selection and its sensitivity to the prior when several of the models under comparison are nearly equally correct (or nearly equally wrong) and are of the same dimension. A previous suggestion to alleviate the problem is to let the internal branch lengths in the tree become increasingly small in the prior with the increase in the data size so that the bifurcating trees are increasingly star-like. In particular, if the internal branch lengths are assigned the exponential prior, the prior mean μ0 should approach zero faster than 1/n but more slowly than 1/n, where n is the sequence length. This paper examines the usefulness of this data size-dependent prior using a dataset of the mitochondrial protein-coding genes from the baleen whales, with the prior mean fixed at μ0=0.1n−2/3. In this dataset, phylogeny reconstruction is sensitive to the assumed evolutionary model, species sampling and the type of data (DNA or protein sequences), but Bayesian inference using the default prior attaches high PPs for conflicting phylogenetic relationships. The data size-dependent prior alleviates the problem to some extent, giving weaker support for unstable relationships. This prior may be useful in reducing apparent conflicts in the results of Bayesian analysis or in making the method less sensitive to model violations.
format Text
author Yang, Ziheng
author_facet Yang, Ziheng
author_sort Yang, Ziheng
title Empirical evaluation of a prior for Bayesian phylogenetic inference
title_short Empirical evaluation of a prior for Bayesian phylogenetic inference
title_full Empirical evaluation of a prior for Bayesian phylogenetic inference
title_fullStr Empirical evaluation of a prior for Bayesian phylogenetic inference
title_full_unstemmed Empirical evaluation of a prior for Bayesian phylogenetic inference
title_sort empirical evaluation of a prior for bayesian phylogenetic inference
publisher The Royal Society
publishDate 2008
url http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2607411
http://www.ncbi.nlm.nih.gov/pubmed/18852106
https://doi.org/10.1098/rstb.2008.0164
genre baleen whales
genre_facet baleen whales
op_relation http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2607411
http://www.ncbi.nlm.nih.gov/pubmed/18852106
http://dx.doi.org/10.1098/rstb.2008.0164
op_rights © 2008 The Royal Society
op_doi https://doi.org/10.1098/rstb.2008.0164
container_title Philosophical Transactions of the Royal Society B: Biological Sciences
container_volume 363
container_issue 1512
container_start_page 4031
op_container_end_page 4039
_version_ 1766367671969906688