Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication

International audience Most population genomic tools rely on accurate SNP calling and filtering to meet their underlying assumptions. However, genomic complexity, resulting from structural variants, paralogous sequences, and repetitive elements, presents significant challenges in assembling contiguo...

Full description

Bibliographic Details
Published in:Genome Biology and Evolution
Main Authors: Dallaire, Xavier, Bouchard, Raphael, Hénault, Philippe, Ulmo-Díaz, Gabriela, Normandeau, Eric, Mérot, Claire, Bernatchez, Louis, Moore, Jean-Sébastien
Other Authors: Université Laval Québec (ULaval), Ecosystèmes, biodiversité, évolution Rennes (ECOBIO), Université de Rennes (UR)-Institut Ecologie et Environnement - CNRS Ecologie et Environnement (INEE-CNRS), Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Observatoire des Sciences de l'Univers de Rennes (OSUR), Université de Rennes (UR)-Institut national des sciences de l'Univers (INSU - CNRS)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut national des sciences de l'Univers (INSU - CNRS)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Centre National de la Recherche Scientifique (CNRS)
Format: Article in Journal/Newspaper
Language:English
Published: HAL CCSD 2023
Subjects:
Online Access:https://hal.science/hal-04350851
https://hal.science/hal-04350851/document
https://hal.science/hal-04350851/file/evad229.pdf
https://doi.org/10.1093/gbe/evad229
id ftunivrennes2hal:oai:HAL:hal-04350851v1
record_format openpolar
institution Open Polar
collection Archive Ouverte de l'Université Rennes (HAL)
op_collection_id ftunivrennes2hal
language English
topic heterozygosity
salmonid
whole-genome sequencing
paralog
autopolyploid
repetitive DNA
[SDE.BE]Environmental Sciences/Biodiversity and Ecology
spellingShingle heterozygosity
salmonid
whole-genome sequencing
paralog
autopolyploid
repetitive DNA
[SDE.BE]Environmental Sciences/Biodiversity and Ecology
Dallaire, Xavier
Bouchard, Raphael
Hénault, Philippe
Ulmo-Díaz, Gabriela
Normandeau, Eric
Mérot, Claire
Bernatchez, Louis
Moore, Jean-Sébastien
Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication
topic_facet heterozygosity
salmonid
whole-genome sequencing
paralog
autopolyploid
repetitive DNA
[SDE.BE]Environmental Sciences/Biodiversity and Ecology
description International audience Most population genomic tools rely on accurate SNP calling and filtering to meet their underlying assumptions. However, genomic complexity, resulting from structural variants, paralogous sequences, and repetitive elements, presents significant challenges in assembling contiguous reference genomes. Consequently, short-read resequencing studies can encounter mismapping issues, leading to SNPs that deviate from Mendelian expected patterns of heterozygosity and allelic ratio. In this study, we employed the ngsParalog software to identify such deviant SNPs in whole-genome sequencing data with low (1.5X) to intermediate (4.8X) coverage for four species: Arctic Char (Salvelinus alpinus), Lake Whitefish (Coregonus clupeaformis), Atlantic Salmon (Salmo salar), and the American Eel (Anguilla rostrata). The analyses revealed that deviant SNPs accounted for 22 to 62% of all SNPs in salmonid datasets and approximately 11% in the American Eel dataset. These deviant SNPs were particularly concentrated within repetitive elements and genomic regions that had recently undergone rediploidization in salmonids. Additionally, narrow peaks of elevated coverage were ubiquitous along all four reference genomes, encompassed most deviant SNPs, and could be partially associated with transposons and tandem repeats. Including these deviant SNPs in genomic analyses led to highly distorted site frequency spectra, underestimated pairwise FST values, and overestimated nucleotide diversity. Considering the widespread occurrence of deviant SNPs arising from a variety of sources, their important impact in estimating population parameters, and the availability of effective tools to identify them, we propose that excluding deviant SNPs from WGS datasets is required to improve genomic inferences for a wide range of taxa and sequencing depths.
author2 Université Laval Québec (ULaval)
Ecosystèmes, biodiversité, évolution Rennes (ECOBIO)
Université de Rennes (UR)-Institut Ecologie et Environnement - CNRS Ecologie et Environnement (INEE-CNRS)
Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Observatoire des Sciences de l'Univers de Rennes (OSUR)
Université de Rennes (UR)-Institut national des sciences de l'Univers (INSU - CNRS)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut national des sciences de l'Univers (INSU - CNRS)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Centre National de la Recherche Scientifique (CNRS)
format Article in Journal/Newspaper
author Dallaire, Xavier
Bouchard, Raphael
Hénault, Philippe
Ulmo-Díaz, Gabriela
Normandeau, Eric
Mérot, Claire
Bernatchez, Louis
Moore, Jean-Sébastien
author_facet Dallaire, Xavier
Bouchard, Raphael
Hénault, Philippe
Ulmo-Díaz, Gabriela
Normandeau, Eric
Mérot, Claire
Bernatchez, Louis
Moore, Jean-Sébastien
author_sort Dallaire, Xavier
title Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication
title_short Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication
title_full Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication
title_fullStr Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication
title_full_unstemmed Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication
title_sort widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication
publisher HAL CCSD
publishDate 2023
url https://hal.science/hal-04350851
https://hal.science/hal-04350851/document
https://hal.science/hal-04350851/file/evad229.pdf
https://doi.org/10.1093/gbe/evad229
geographic Arctic
geographic_facet Arctic
genre Arctic
Atlantic salmon
Salmo salar
Salvelinus alpinus
genre_facet Arctic
Atlantic salmon
Salmo salar
Salvelinus alpinus
op_source ISSN: 1759-6653
EISSN: 1759-6653
Genome Biology and Evolution
https://hal.science/hal-04350851
Genome Biology and Evolution, 2023, Genome Biology and Evolution, ⟨10.1093/gbe/evad229⟩
op_relation info:eu-repo/semantics/altIdentifier/doi/10.1093/gbe/evad229
info:eu-repo/semantics/altIdentifier/pmid/38085037
hal-04350851
https://hal.science/hal-04350851
https://hal.science/hal-04350851/document
https://hal.science/hal-04350851/file/evad229.pdf
doi:10.1093/gbe/evad229
PUBMED: 38085037
op_rights http://creativecommons.org/licenses/by-nc-nd/
info:eu-repo/semantics/OpenAccess
op_doi https://doi.org/10.1093/gbe/evad229
container_title Genome Biology and Evolution
container_volume 15
container_issue 12
_version_ 1798842266038566912
spelling ftunivrennes2hal:oai:HAL:hal-04350851v1 2024-05-12T08:00:23+00:00 Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication Dallaire, Xavier Bouchard, Raphael Hénault, Philippe Ulmo-Díaz, Gabriela Normandeau, Eric Mérot, Claire Bernatchez, Louis Moore, Jean-Sébastien Université Laval Québec (ULaval) Ecosystèmes, biodiversité, évolution Rennes (ECOBIO) Université de Rennes (UR)-Institut Ecologie et Environnement - CNRS Ecologie et Environnement (INEE-CNRS) Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Observatoire des Sciences de l'Univers de Rennes (OSUR) Université de Rennes (UR)-Institut national des sciences de l'Univers (INSU - CNRS)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut national des sciences de l'Univers (INSU - CNRS)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Centre National de la Recherche Scientifique (CNRS) 2023 https://hal.science/hal-04350851 https://hal.science/hal-04350851/document https://hal.science/hal-04350851/file/evad229.pdf https://doi.org/10.1093/gbe/evad229 en eng HAL CCSD Society for Molecular Biology and Evolution info:eu-repo/semantics/altIdentifier/doi/10.1093/gbe/evad229 info:eu-repo/semantics/altIdentifier/pmid/38085037 hal-04350851 https://hal.science/hal-04350851 https://hal.science/hal-04350851/document https://hal.science/hal-04350851/file/evad229.pdf doi:10.1093/gbe/evad229 PUBMED: 38085037 http://creativecommons.org/licenses/by-nc-nd/ info:eu-repo/semantics/OpenAccess ISSN: 1759-6653 EISSN: 1759-6653 Genome Biology and Evolution https://hal.science/hal-04350851 Genome Biology and Evolution, 2023, Genome Biology and Evolution, ⟨10.1093/gbe/evad229⟩ heterozygosity salmonid whole-genome sequencing paralog autopolyploid repetitive DNA [SDE.BE]Environmental Sciences/Biodiversity and Ecology info:eu-repo/semantics/article Journal articles 2023 ftunivrennes2hal https://doi.org/10.1093/gbe/evad229 2024-04-17T16:17:33Z International audience Most population genomic tools rely on accurate SNP calling and filtering to meet their underlying assumptions. However, genomic complexity, resulting from structural variants, paralogous sequences, and repetitive elements, presents significant challenges in assembling contiguous reference genomes. Consequently, short-read resequencing studies can encounter mismapping issues, leading to SNPs that deviate from Mendelian expected patterns of heterozygosity and allelic ratio. In this study, we employed the ngsParalog software to identify such deviant SNPs in whole-genome sequencing data with low (1.5X) to intermediate (4.8X) coverage for four species: Arctic Char (Salvelinus alpinus), Lake Whitefish (Coregonus clupeaformis), Atlantic Salmon (Salmo salar), and the American Eel (Anguilla rostrata). The analyses revealed that deviant SNPs accounted for 22 to 62% of all SNPs in salmonid datasets and approximately 11% in the American Eel dataset. These deviant SNPs were particularly concentrated within repetitive elements and genomic regions that had recently undergone rediploidization in salmonids. Additionally, narrow peaks of elevated coverage were ubiquitous along all four reference genomes, encompassed most deviant SNPs, and could be partially associated with transposons and tandem repeats. Including these deviant SNPs in genomic analyses led to highly distorted site frequency spectra, underestimated pairwise FST values, and overestimated nucleotide diversity. Considering the widespread occurrence of deviant SNPs arising from a variety of sources, their important impact in estimating population parameters, and the availability of effective tools to identify them, we propose that excluding deviant SNPs from WGS datasets is required to improve genomic inferences for a wide range of taxa and sequencing depths. Article in Journal/Newspaper Arctic Atlantic salmon Salmo salar Salvelinus alpinus Archive Ouverte de l'Université Rennes (HAL) Arctic Genome Biology and Evolution 15 12