An accurate assignment test for extremely low-coverage whole-genome sequence data

Genomic assignment tests can provide important diagnostic biological characteristics, such as population of origin or ecotype. Yet, assignment tests often rely on moderate- to high-coverage sequence data that can be difficult to obtain for fields such as molecular ecology and ancient DNA. We have de...

Full description

Bibliographic Details
Published in:Molecular Ecology Resources
Main Authors: Ferrari, Giada, Atmore, Lane Margaret, Jentoft, Sissel, Jakobsen, Kjetill Sigurd, Makowiecki, Daniel, Barrett, James, Star, Bastiaan
Format: Article in Journal/Newspaper
Language:English
Published: Wiley 2021
Subjects:
Online Access:https://hdl.handle.net/11250/2977423
https://doi.org/10.1111/1755-0998.13551
id ftntnutrondheimi:oai:ntnuopen.ntnu.no:11250/2977423
record_format openpolar
spelling ftntnutrondheimi:oai:ntnuopen.ntnu.no:11250/2977423 2023-05-15T15:27:35+02:00 An accurate assignment test for extremely low-coverage whole-genome sequence data Ferrari, Giada Atmore, Lane Margaret Jentoft, Sissel Jakobsen, Kjetill Sigurd Makowiecki, Daniel Barrett, James Star, Bastiaan 2021 application/pdf https://hdl.handle.net/11250/2977423 https://doi.org/10.1111/1755-0998.13551 eng eng Wiley Norges forskningsråd: 221734 Norges forskningsråd: 262777 EU – Horisont Europa (EC/HEU): 813383 Molecular Ecology Resources. 2021, 1-15. urn:issn:1755-098X https://hdl.handle.net/11250/2977423 https://doi.org/10.1111/1755-0998.13551 cristin:1959570 Navngivelse-Ikkekommersiell 4.0 Internasjonal http://creativecommons.org/licenses/by-nc/4.0/deed.no CC-BY-NC 1-15 Molecular Ecology Resources Journal article Peer reviewed 2021 ftntnutrondheimi https://doi.org/10.1111/1755-0998.13551 2022-02-09T23:38:20Z Genomic assignment tests can provide important diagnostic biological characteristics, such as population of origin or ecotype. Yet, assignment tests often rely on moderate- to high-coverage sequence data that can be difficult to obtain for fields such as molecular ecology and ancient DNA. We have developed a novel approach that efficiently assigns biologically relevant information (i.e., population identity or structural variants such as inversions) in extremely low-coverage sequence data. First, we generate databases from existing reference data using a subset of diagnostic single nucleotide polymorphisms (SNPs) associated with a biological characteristic. Low-coverage alignment files are subsequently compared to these databases to ascertain allelic state, yielding a joint probability for each association. To assess the efficacy of this approach, we assigned haplotypes and population identity in Heliconius butterflies, Atlantic herring, and Atlantic cod using chromosomal inversion sites and whole-genome data. We scored both modern and ancient specimens, including the first whole-genome sequence data recovered from ancient Atlantic herring bones. The method accurately assigns biological characteristics, including population membership, using extremely low-coverage data (as low as 0.0001x) based on genome-wide SNPs. This approach will therefore increase the number of samples in evolutionary, ecological and archaeological research for which relevant biological information can be obtained. publishedVersion Article in Journal/Newspaper atlantic cod NTNU Open Archive (Norwegian University of Science and Technology) Molecular Ecology Resources 22 4 1330 1344
institution Open Polar
collection NTNU Open Archive (Norwegian University of Science and Technology)
op_collection_id ftntnutrondheimi
language English
description Genomic assignment tests can provide important diagnostic biological characteristics, such as population of origin or ecotype. Yet, assignment tests often rely on moderate- to high-coverage sequence data that can be difficult to obtain for fields such as molecular ecology and ancient DNA. We have developed a novel approach that efficiently assigns biologically relevant information (i.e., population identity or structural variants such as inversions) in extremely low-coverage sequence data. First, we generate databases from existing reference data using a subset of diagnostic single nucleotide polymorphisms (SNPs) associated with a biological characteristic. Low-coverage alignment files are subsequently compared to these databases to ascertain allelic state, yielding a joint probability for each association. To assess the efficacy of this approach, we assigned haplotypes and population identity in Heliconius butterflies, Atlantic herring, and Atlantic cod using chromosomal inversion sites and whole-genome data. We scored both modern and ancient specimens, including the first whole-genome sequence data recovered from ancient Atlantic herring bones. The method accurately assigns biological characteristics, including population membership, using extremely low-coverage data (as low as 0.0001x) based on genome-wide SNPs. This approach will therefore increase the number of samples in evolutionary, ecological and archaeological research for which relevant biological information can be obtained. publishedVersion
format Article in Journal/Newspaper
author Ferrari, Giada
Atmore, Lane Margaret
Jentoft, Sissel
Jakobsen, Kjetill Sigurd
Makowiecki, Daniel
Barrett, James
Star, Bastiaan
spellingShingle Ferrari, Giada
Atmore, Lane Margaret
Jentoft, Sissel
Jakobsen, Kjetill Sigurd
Makowiecki, Daniel
Barrett, James
Star, Bastiaan
An accurate assignment test for extremely low-coverage whole-genome sequence data
author_facet Ferrari, Giada
Atmore, Lane Margaret
Jentoft, Sissel
Jakobsen, Kjetill Sigurd
Makowiecki, Daniel
Barrett, James
Star, Bastiaan
author_sort Ferrari, Giada
title An accurate assignment test for extremely low-coverage whole-genome sequence data
title_short An accurate assignment test for extremely low-coverage whole-genome sequence data
title_full An accurate assignment test for extremely low-coverage whole-genome sequence data
title_fullStr An accurate assignment test for extremely low-coverage whole-genome sequence data
title_full_unstemmed An accurate assignment test for extremely low-coverage whole-genome sequence data
title_sort accurate assignment test for extremely low-coverage whole-genome sequence data
publisher Wiley
publishDate 2021
url https://hdl.handle.net/11250/2977423
https://doi.org/10.1111/1755-0998.13551
genre atlantic cod
genre_facet atlantic cod
op_source 1-15
Molecular Ecology Resources
op_relation Norges forskningsråd: 221734
Norges forskningsråd: 262777
EU – Horisont Europa (EC/HEU): 813383
Molecular Ecology Resources. 2021, 1-15.
urn:issn:1755-098X
https://hdl.handle.net/11250/2977423
https://doi.org/10.1111/1755-0998.13551
cristin:1959570
op_rights Navngivelse-Ikkekommersiell 4.0 Internasjonal
http://creativecommons.org/licenses/by-nc/4.0/deed.no
op_rightsnorm CC-BY-NC
op_doi https://doi.org/10.1111/1755-0998.13551
container_title Molecular Ecology Resources
container_volume 22
container_issue 4
container_start_page 1330
op_container_end_page 1344
_version_ 1766358007738793984