PHFinder: assisted detection of point heteroplasmy in Sanger sequencing chromatograms

Heteroplasmy is the presence of two or more organellar genomes (mitochondrial or plastid DNA) in an organism, tissue, cell or organelle. Heteroplasmy can be detected by visual inspection of Sanger sequencing chromatograms, where it appears as multiple peaks of fluorescence at a single nucleotide pos...

Full description

Bibliographic Details
Published in:PeerJ
Main Authors: Marcos Suárez Menéndez, Vania E. Rivera-León, Jooke Robbins, Martine Berube, Per J. Palsbøll
Format: Article in Journal/Newspaper
Language:English
Published: PeerJ Inc. 2023
Subjects:
AB1
R
Online Access:https://doi.org/10.7717/peerj.16028
https://doaj.org/article/b230a1d24d2248cd92230267e386a58b
Description
Summary:Heteroplasmy is the presence of two or more organellar genomes (mitochondrial or plastid DNA) in an organism, tissue, cell or organelle. Heteroplasmy can be detected by visual inspection of Sanger sequencing chromatograms, where it appears as multiple peaks of fluorescence at a single nucleotide position. Visual inspection of chromatograms is both consuming and highly subjective, as heteroplasmy is difficult to differentiate from background noise. Few software solutions are available to automate the detection of point heteroplasmies, and those that are available are typically proprietary, lack customization or are unsuitable for automated heteroplasmy assessment in large datasets. Here, we present PHFinder, a Python-based, open-source tool to assist in the detection of point heteroplasmies in large numbers of Sanger chromatograms. PHFinder automatically identifies point heteroplasmies directly from the chromatogram trace data. The program was tested with Sanger sequencing data from 100 humpback whales (Megaptera novaeangliae) tissue samples with known heteroplasmies. PHFinder detected most (90%) of the known heteroplasmies thereby greatly reducing the amount of visual inspection required. PHFinder is flexible and enables explicit specification of key parameters to infer double peaks (i.e., heteroplasmies).