WGDTree: a phylogenetic software tool to examine conditional probabilities of retention following whole genome duplication events

Abstract Background Multiple processes impact the probability of retention of individual genes following whole genome duplication (WGD) events. In analyzing two consecutive whole genome duplication events that occurred in the lineage leading to Atlantic salmon, a new phylogenetic statistical analysi...

Full description

Bibliographic Details
Published in:BMC Bioinformatics
Main Authors: C. Nicholas Henry, Kathryn Piper, Amanda E. Wilson, John L. Miraszek, Claire S. Probst, Yuying Rong, David A. Liberles
Format: Article in Journal/Newspaper
Language:English
Published: BMC 2022
Subjects:
Online Access:https://doi.org/10.1186/s12859-022-05042-w
https://doaj.org/article/53894a2446ae4d7eb634b72e26f7697e
Description
Summary:Abstract Background Multiple processes impact the probability of retention of individual genes following whole genome duplication (WGD) events. In analyzing two consecutive whole genome duplication events that occurred in the lineage leading to Atlantic salmon, a new phylogenetic statistical analysis was developed to examine the contingency of retention in one event based upon retention in a previous event. This analysis is intended to evaluate mechanisms of duplicate gene retention and to provide software to generate the test statistic for any genome with pairs of WGDs in its history. Results Here a software package written in Python, ‘WGDTree’ for the analysis of duplicate gene retention following whole genome duplication events is presented. Using gene tree-species tree reconciliation to label gene duplicate nodes and differentiate between WGD and SSD duplicates, the tool calculates a statistic based upon the conditional probability of a gene duplicate being retained after a second whole genome duplication dependent upon the retention status after the first event. The package also contains methods for the simulation of gene trees with WGD events. After running simulations, the accuracy of the placement of events has been determined to be high. The conditional probability statistic has been calculated for Phalaenopsis equestris on a monocot species tree with a pair of consecutive WGD events on its lineage, showing the applicability of the method. Conclusions A new software tool has been created for the analysis of duplicate genes in examination of retention mechanisms. The software tool has been made available on the Python package index and the source code can be found on GitHub here: https://github.com/cnickh/wgdtree .