Widespread Deviant Patterns of Heterozygosity in Whole-Genome Sequencing Due to Autopolyploidy, Repeated Elements, and Duplication

Most population genomic tools rely on accurate single nucleotide polymorphism (SNP) calling and filtering to meet their underlying assumptions. However, genomic complexity, resulting from structural variants, paralogous sequences, and repetitive elements, presents significant challenges in assemblin...

Full description

Bibliographic Details
Published in:Genome Biology and Evolution
Main Authors: Dallaire, Xavier, Bouchard, Raphael, Hénault, Philippe, Ulmo-Diaz, Gabriela, Normandeau, Eric, Mérot, Claire, Bernatchez, Louis, Moore, Jean-Sébastien
Format: Text
Language:English
Published: Oxford University Press 2023
Subjects:
Online Access:http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10752349/
http://www.ncbi.nlm.nih.gov/pubmed/38085037
https://doi.org/10.1093/gbe/evad229
id ftpubmed:oai:pubmedcentral.nih.gov:10752349
record_format openpolar
spelling ftpubmed:oai:pubmedcentral.nih.gov:10752349 2024-01-28T10:04:04+01:00 Widespread Deviant Patterns of Heterozygosity in Whole-Genome Sequencing Due to Autopolyploidy, Repeated Elements, and Duplication Dallaire, Xavier Bouchard, Raphael Hénault, Philippe Ulmo-Diaz, Gabriela Normandeau, Eric Mérot, Claire Bernatchez, Louis Moore, Jean-Sébastien 2023-12-12 http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10752349/ http://www.ncbi.nlm.nih.gov/pubmed/38085037 https://doi.org/10.1093/gbe/evad229 en eng Oxford University Press http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10752349/ http://www.ncbi.nlm.nih.gov/pubmed/38085037 http://dx.doi.org/10.1093/gbe/evad229 © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com Genome Biol Evol Article Text 2023 ftpubmed https://doi.org/10.1093/gbe/evad229 2023-12-31T01:52:09Z Most population genomic tools rely on accurate single nucleotide polymorphism (SNP) calling and filtering to meet their underlying assumptions. However, genomic complexity, resulting from structural variants, paralogous sequences, and repetitive elements, presents significant challenges in assembling contiguous reference genomes. Consequently, short-read resequencing studies can encounter mismapping issues, leading to SNPs that deviate from Mendelian expected patterns of heterozygosity and allelic ratio. In this study, we employed the ngsParalog software to identify such deviant SNPs in whole-genome sequencing (WGS) data with low (1.5×) to intermediate (4.8×) coverage for four species: Arctic Char (Salvelinus alpinus), Lake Whitefish (Coregonus clupeaformis), Atlantic Salmon (Salmo salar), and the American Eel (Anguilla rostrata). The analyses revealed that deviant SNPs accounted for 22% to 62% of all SNPs in salmonid datasets and approximately 11% in the American Eel dataset. These deviant SNPs were particularly concentrated within repetitive elements and genomic regions that had recently undergone rediploidization in salmonids. Additionally, narrow peaks of elevated coverage were ubiquitous along all four reference genomes, encompassed most deviant SNPs, and could be partially associated with transposons and tandem repeats. Including these deviant SNPs in genomic analyses led to highly distorted site frequency spectra, underestimated pairwise F(ST) values, and overestimated nucleotide diversity. Considering the widespread occurrence of deviant SNPs arising from a variety of sources, their important impact in estimating population parameters, and the availability of effective tools to identify them, we propose that excluding deviant SNPs from WGS datasets is required to improve genomic inferences for a wide range of taxa and sequencing depths. Text Arctic Atlantic salmon Salmo salar Salvelinus alpinus PubMed Central (PMC) Arctic Genome Biology and Evolution 15 12
institution Open Polar
collection PubMed Central (PMC)
op_collection_id ftpubmed
language English
topic Article
spellingShingle Article
Dallaire, Xavier
Bouchard, Raphael
Hénault, Philippe
Ulmo-Diaz, Gabriela
Normandeau, Eric
Mérot, Claire
Bernatchez, Louis
Moore, Jean-Sébastien
Widespread Deviant Patterns of Heterozygosity in Whole-Genome Sequencing Due to Autopolyploidy, Repeated Elements, and Duplication
topic_facet Article
description Most population genomic tools rely on accurate single nucleotide polymorphism (SNP) calling and filtering to meet their underlying assumptions. However, genomic complexity, resulting from structural variants, paralogous sequences, and repetitive elements, presents significant challenges in assembling contiguous reference genomes. Consequently, short-read resequencing studies can encounter mismapping issues, leading to SNPs that deviate from Mendelian expected patterns of heterozygosity and allelic ratio. In this study, we employed the ngsParalog software to identify such deviant SNPs in whole-genome sequencing (WGS) data with low (1.5×) to intermediate (4.8×) coverage for four species: Arctic Char (Salvelinus alpinus), Lake Whitefish (Coregonus clupeaformis), Atlantic Salmon (Salmo salar), and the American Eel (Anguilla rostrata). The analyses revealed that deviant SNPs accounted for 22% to 62% of all SNPs in salmonid datasets and approximately 11% in the American Eel dataset. These deviant SNPs were particularly concentrated within repetitive elements and genomic regions that had recently undergone rediploidization in salmonids. Additionally, narrow peaks of elevated coverage were ubiquitous along all four reference genomes, encompassed most deviant SNPs, and could be partially associated with transposons and tandem repeats. Including these deviant SNPs in genomic analyses led to highly distorted site frequency spectra, underestimated pairwise F(ST) values, and overestimated nucleotide diversity. Considering the widespread occurrence of deviant SNPs arising from a variety of sources, their important impact in estimating population parameters, and the availability of effective tools to identify them, we propose that excluding deviant SNPs from WGS datasets is required to improve genomic inferences for a wide range of taxa and sequencing depths.
format Text
author Dallaire, Xavier
Bouchard, Raphael
Hénault, Philippe
Ulmo-Diaz, Gabriela
Normandeau, Eric
Mérot, Claire
Bernatchez, Louis
Moore, Jean-Sébastien
author_facet Dallaire, Xavier
Bouchard, Raphael
Hénault, Philippe
Ulmo-Diaz, Gabriela
Normandeau, Eric
Mérot, Claire
Bernatchez, Louis
Moore, Jean-Sébastien
author_sort Dallaire, Xavier
title Widespread Deviant Patterns of Heterozygosity in Whole-Genome Sequencing Due to Autopolyploidy, Repeated Elements, and Duplication
title_short Widespread Deviant Patterns of Heterozygosity in Whole-Genome Sequencing Due to Autopolyploidy, Repeated Elements, and Duplication
title_full Widespread Deviant Patterns of Heterozygosity in Whole-Genome Sequencing Due to Autopolyploidy, Repeated Elements, and Duplication
title_fullStr Widespread Deviant Patterns of Heterozygosity in Whole-Genome Sequencing Due to Autopolyploidy, Repeated Elements, and Duplication
title_full_unstemmed Widespread Deviant Patterns of Heterozygosity in Whole-Genome Sequencing Due to Autopolyploidy, Repeated Elements, and Duplication
title_sort widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication
publisher Oxford University Press
publishDate 2023
url http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10752349/
http://www.ncbi.nlm.nih.gov/pubmed/38085037
https://doi.org/10.1093/gbe/evad229
geographic Arctic
geographic_facet Arctic
genre Arctic
Atlantic salmon
Salmo salar
Salvelinus alpinus
genre_facet Arctic
Atlantic salmon
Salmo salar
Salvelinus alpinus
op_source Genome Biol Evol
op_relation http://www.ncbi.nlm.nih.gov/pmc/articles/PMC10752349/
http://www.ncbi.nlm.nih.gov/pubmed/38085037
http://dx.doi.org/10.1093/gbe/evad229
op_rights © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution.
https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
op_doi https://doi.org/10.1093/gbe/evad229
container_title Genome Biology and Evolution
container_volume 15
container_issue 12
_version_ 1789329656704401408