Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication
International audience Most population genomic tools rely on accurate SNP calling and filtering to meet their underlying assumptions. However, genomic complexity, resulting from structural variants, paralogous sequences, and repetitive elements, presents significant challenges in assembling contiguo...
Published in: | Genome Biology and Evolution |
---|---|
Main Authors: | , , , , , , , |
Other Authors: | , , , , |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
HAL CCSD
2023
|
Subjects: | |
Online Access: | https://hal.science/hal-04350851 https://hal.science/hal-04350851/document https://hal.science/hal-04350851/file/evad229.pdf https://doi.org/10.1093/gbe/evad229 |
id |
ftunivrennes1hal:oai:HAL:hal-04350851v1 |
---|---|
record_format |
openpolar |
institution |
Open Polar |
collection |
Université de Rennes 1: Publications scientifiques (HAL) |
op_collection_id |
ftunivrennes1hal |
language |
English |
topic |
heterozygosity salmonid whole-genome sequencing paralog autopolyploid repetitive DNA [SDE.BE]Environmental Sciences/Biodiversity and Ecology |
spellingShingle |
heterozygosity salmonid whole-genome sequencing paralog autopolyploid repetitive DNA [SDE.BE]Environmental Sciences/Biodiversity and Ecology Dallaire, Xavier Bouchard, Raphael Hénault, Philippe Ulmo-Díaz, Gabriela Normandeau, Eric Mérot, Claire Bernatchez, Louis Moore, Jean-Sébastien Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication |
topic_facet |
heterozygosity salmonid whole-genome sequencing paralog autopolyploid repetitive DNA [SDE.BE]Environmental Sciences/Biodiversity and Ecology |
description |
International audience Most population genomic tools rely on accurate SNP calling and filtering to meet their underlying assumptions. However, genomic complexity, resulting from structural variants, paralogous sequences, and repetitive elements, presents significant challenges in assembling contiguous reference genomes. Consequently, short-read resequencing studies can encounter mismapping issues, leading to SNPs that deviate from Mendelian expected patterns of heterozygosity and allelic ratio. In this study, we employed the ngsParalog software to identify such deviant SNPs in whole-genome sequencing data with low (1.5X) to intermediate (4.8X) coverage for four species: Arctic Char (Salvelinus alpinus), Lake Whitefish (Coregonus clupeaformis), Atlantic Salmon (Salmo salar), and the American Eel (Anguilla rostrata). The analyses revealed that deviant SNPs accounted for 22 to 62% of all SNPs in salmonid datasets and approximately 11% in the American Eel dataset. These deviant SNPs were particularly concentrated within repetitive elements and genomic regions that had recently undergone rediploidization in salmonids. Additionally, narrow peaks of elevated coverage were ubiquitous along all four reference genomes, encompassed most deviant SNPs, and could be partially associated with transposons and tandem repeats. Including these deviant SNPs in genomic analyses led to highly distorted site frequency spectra, underestimated pairwise FST values, and overestimated nucleotide diversity. Considering the widespread occurrence of deviant SNPs arising from a variety of sources, their important impact in estimating population parameters, and the availability of effective tools to identify them, we propose that excluding deviant SNPs from WGS datasets is required to improve genomic inferences for a wide range of taxa and sequencing depths. |
author2 |
Université Laval Québec (ULaval) Ecosystèmes, biodiversité, évolution Rennes (ECOBIO) Université de Rennes (UR)-Institut Ecologie et Environnement - CNRS Ecologie et Environnement (INEE-CNRS) Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Observatoire des Sciences de l'Univers de Rennes (OSUR) Université de Rennes (UR)-Institut national des sciences de l'Univers (INSU - CNRS)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut national des sciences de l'Univers (INSU - CNRS)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Centre National de la Recherche Scientifique (CNRS) |
format |
Article in Journal/Newspaper |
author |
Dallaire, Xavier Bouchard, Raphael Hénault, Philippe Ulmo-Díaz, Gabriela Normandeau, Eric Mérot, Claire Bernatchez, Louis Moore, Jean-Sébastien |
author_facet |
Dallaire, Xavier Bouchard, Raphael Hénault, Philippe Ulmo-Díaz, Gabriela Normandeau, Eric Mérot, Claire Bernatchez, Louis Moore, Jean-Sébastien |
author_sort |
Dallaire, Xavier |
title |
Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication |
title_short |
Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication |
title_full |
Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication |
title_fullStr |
Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication |
title_full_unstemmed |
Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication |
title_sort |
widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication |
publisher |
HAL CCSD |
publishDate |
2023 |
url |
https://hal.science/hal-04350851 https://hal.science/hal-04350851/document https://hal.science/hal-04350851/file/evad229.pdf https://doi.org/10.1093/gbe/evad229 |
geographic |
Arctic |
geographic_facet |
Arctic |
genre |
Arctic Atlantic salmon Salmo salar Salvelinus alpinus |
genre_facet |
Arctic Atlantic salmon Salmo salar Salvelinus alpinus |
op_source |
ISSN: 1759-6653 EISSN: 1759-6653 Genome Biology and Evolution https://hal.science/hal-04350851 Genome Biology and Evolution, 2023, Genome Biology and Evolution, ⟨10.1093/gbe/evad229⟩ |
op_relation |
info:eu-repo/semantics/altIdentifier/doi/10.1093/gbe/evad229 info:eu-repo/semantics/altIdentifier/pmid/38085037 hal-04350851 https://hal.science/hal-04350851 https://hal.science/hal-04350851/document https://hal.science/hal-04350851/file/evad229.pdf doi:10.1093/gbe/evad229 PUBMED: 38085037 |
op_rights |
http://creativecommons.org/licenses/by-nc-nd/ info:eu-repo/semantics/OpenAccess |
op_doi |
https://doi.org/10.1093/gbe/evad229 |
container_title |
Genome Biology and Evolution |
container_volume |
15 |
container_issue |
12 |
_version_ |
1798842383829303296 |
spelling |
ftunivrennes1hal:oai:HAL:hal-04350851v1 2024-05-12T08:00:29+00:00 Widespread deviant patterns of heterozygosity in whole-genome sequencing due to autopolyploidy, repeated elements, and duplication Dallaire, Xavier Bouchard, Raphael Hénault, Philippe Ulmo-Díaz, Gabriela Normandeau, Eric Mérot, Claire Bernatchez, Louis Moore, Jean-Sébastien Université Laval Québec (ULaval) Ecosystèmes, biodiversité, évolution Rennes (ECOBIO) Université de Rennes (UR)-Institut Ecologie et Environnement - CNRS Ecologie et Environnement (INEE-CNRS) Centre National de la Recherche Scientifique (CNRS)-Centre National de la Recherche Scientifique (CNRS)-Observatoire des Sciences de l'Univers de Rennes (OSUR) Université de Rennes (UR)-Institut national des sciences de l'Univers (INSU - CNRS)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Institut national des sciences de l'Univers (INSU - CNRS)-Université de Rennes 2 (UR2)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Centre National de la Recherche Scientifique (CNRS) 2023 https://hal.science/hal-04350851 https://hal.science/hal-04350851/document https://hal.science/hal-04350851/file/evad229.pdf https://doi.org/10.1093/gbe/evad229 en eng HAL CCSD Society for Molecular Biology and Evolution info:eu-repo/semantics/altIdentifier/doi/10.1093/gbe/evad229 info:eu-repo/semantics/altIdentifier/pmid/38085037 hal-04350851 https://hal.science/hal-04350851 https://hal.science/hal-04350851/document https://hal.science/hal-04350851/file/evad229.pdf doi:10.1093/gbe/evad229 PUBMED: 38085037 http://creativecommons.org/licenses/by-nc-nd/ info:eu-repo/semantics/OpenAccess ISSN: 1759-6653 EISSN: 1759-6653 Genome Biology and Evolution https://hal.science/hal-04350851 Genome Biology and Evolution, 2023, Genome Biology and Evolution, ⟨10.1093/gbe/evad229⟩ heterozygosity salmonid whole-genome sequencing paralog autopolyploid repetitive DNA [SDE.BE]Environmental Sciences/Biodiversity and Ecology info:eu-repo/semantics/article Journal articles 2023 ftunivrennes1hal https://doi.org/10.1093/gbe/evad229 2024-04-18T00:07:47Z International audience Most population genomic tools rely on accurate SNP calling and filtering to meet their underlying assumptions. However, genomic complexity, resulting from structural variants, paralogous sequences, and repetitive elements, presents significant challenges in assembling contiguous reference genomes. Consequently, short-read resequencing studies can encounter mismapping issues, leading to SNPs that deviate from Mendelian expected patterns of heterozygosity and allelic ratio. In this study, we employed the ngsParalog software to identify such deviant SNPs in whole-genome sequencing data with low (1.5X) to intermediate (4.8X) coverage for four species: Arctic Char (Salvelinus alpinus), Lake Whitefish (Coregonus clupeaformis), Atlantic Salmon (Salmo salar), and the American Eel (Anguilla rostrata). The analyses revealed that deviant SNPs accounted for 22 to 62% of all SNPs in salmonid datasets and approximately 11% in the American Eel dataset. These deviant SNPs were particularly concentrated within repetitive elements and genomic regions that had recently undergone rediploidization in salmonids. Additionally, narrow peaks of elevated coverage were ubiquitous along all four reference genomes, encompassed most deviant SNPs, and could be partially associated with transposons and tandem repeats. Including these deviant SNPs in genomic analyses led to highly distorted site frequency spectra, underestimated pairwise FST values, and overestimated nucleotide diversity. Considering the widespread occurrence of deviant SNPs arising from a variety of sources, their important impact in estimating population parameters, and the availability of effective tools to identify them, we propose that excluding deviant SNPs from WGS datasets is required to improve genomic inferences for a wide range of taxa and sequencing depths. Article in Journal/Newspaper Arctic Atlantic salmon Salmo salar Salvelinus alpinus Université de Rennes 1: Publications scientifiques (HAL) Arctic Genome Biology and Evolution 15 12 |