From reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species

Abstract Background Recent advances in genomics have greatly increased research opportunities for non-model species. For wildlife, a growing availability of reference genomes means that population genetics is no longer restricted to a small set of anonymous loci. When used in conjunction with a refe...

Full description

Bibliographic Details
Main Authors: Wright, Belinda, Farquharson, Katherine, McLennan, Elspeth, Belov, Katherine, Hogg, Carolyn, Grueber, Catherine
Format: Article in Journal/Newspaper
Language:unknown
Published: Figshare 2019
Subjects:
Online Access:https://dx.doi.org/10.6084/m9.figshare.c.4528061.v1
https://springernature.figshare.com/collections/From_reference_genomes_to_population_genomics_comparing_three_reference-aligned_reduced-representation_sequencing_pipelines_in_two_wildlife_species/4528061/1
id ftdatacite:10.6084/m9.figshare.c.4528061.v1
record_format openpolar
spelling ftdatacite:10.6084/m9.figshare.c.4528061.v1 2023-05-15T13:30:00+02:00 From reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species Wright, Belinda Farquharson, Katherine McLennan, Elspeth Belov, Katherine Hogg, Carolyn Grueber, Catherine 2019 https://dx.doi.org/10.6084/m9.figshare.c.4528061.v1 https://springernature.figshare.com/collections/From_reference_genomes_to_population_genomics_comparing_three_reference-aligned_reduced-representation_sequencing_pipelines_in_two_wildlife_species/4528061/1 unknown Figshare https://dx.doi.org/10.1186/s12864-019-5806-y https://dx.doi.org/10.6084/m9.figshare.c.4528061 CC BY 4.0 https://creativecommons.org/licenses/by/4.0 CC-BY Genetics FOS Biological sciences Evolutionary Biology Ecology 69999 Biological Sciences not elsewhere classified 80699 Information Systems not elsewhere classified FOS Computer and information sciences Collection article 2019 ftdatacite https://doi.org/10.6084/m9.figshare.c.4528061.v1 https://doi.org/10.1186/s12864-019-5806-y https://doi.org/10.6084/m9.figshare.c.4528061 2021-11-05T12:55:41Z Abstract Background Recent advances in genomics have greatly increased research opportunities for non-model species. For wildlife, a growing availability of reference genomes means that population genetics is no longer restricted to a small set of anonymous loci. When used in conjunction with a reference genome, reduced-representation sequencing (RRS) provides a cost-effective method for obtaining reliable diversity information for population genetics. Many software tools have been developed to process RRS data, though few studies of non-model species incorporate genome alignment in calling loci. A commonly-used RRS analysis pipeline, Stacks, has this capacity and so it is timely to compare its utility with existing software originally designed for alignment and analysis of whole genome sequencing data. Here we examine population genetic inferences from two species for which reference-aligned reduced-representation data have been collected. Our two study species are a threatened Australian marsupial (Tasmanian devil Sarcophilus harrisii; declining population) and an Arctic-circle migrant bird (pink-footed goose Anser brachyrhynchus; expanding population). Analyses of these data are compared using Stacks versus two widely-used genomics packages, SAMtools and GATK. We also introduce a custom R script to improve the reliability of single nucleotide polymorphism (SNP) calls in all pipelines and conduct population genetic inferences for non-model species with reference genomes. Results Although we identified orders of magnitude fewer SNPs in our devil dataset than for goose, we found remarkable symmetry between the two species in our assessment of software performance. For both datasets, all three methods were able to delineate population structure, even with varying numbers of loci. For both species, population structure inferences were influenced by the percent of missing data. Conclusions For studies of non-model species with a reference genome, we recommend combining Stacks output with further filtering (as included in our R pipeline) for population genetic studies, paying particular attention to potential impact of missing data thresholds. We recognise SAMtools as a viable alternative for researchers more familiar with this software. We caution against the use of GATK in studies with limited computational resources or time. Article in Journal/Newspaper Anser brachyrhynchus Arctic Pink-footed Goose DataCite Metadata Store (German National Library of Science and Technology) Arctic
institution Open Polar
collection DataCite Metadata Store (German National Library of Science and Technology)
op_collection_id ftdatacite
language unknown
topic Genetics
FOS Biological sciences
Evolutionary Biology
Ecology
69999 Biological Sciences not elsewhere classified
80699 Information Systems not elsewhere classified
FOS Computer and information sciences
spellingShingle Genetics
FOS Biological sciences
Evolutionary Biology
Ecology
69999 Biological Sciences not elsewhere classified
80699 Information Systems not elsewhere classified
FOS Computer and information sciences
Wright, Belinda
Farquharson, Katherine
McLennan, Elspeth
Belov, Katherine
Hogg, Carolyn
Grueber, Catherine
From reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species
topic_facet Genetics
FOS Biological sciences
Evolutionary Biology
Ecology
69999 Biological Sciences not elsewhere classified
80699 Information Systems not elsewhere classified
FOS Computer and information sciences
description Abstract Background Recent advances in genomics have greatly increased research opportunities for non-model species. For wildlife, a growing availability of reference genomes means that population genetics is no longer restricted to a small set of anonymous loci. When used in conjunction with a reference genome, reduced-representation sequencing (RRS) provides a cost-effective method for obtaining reliable diversity information for population genetics. Many software tools have been developed to process RRS data, though few studies of non-model species incorporate genome alignment in calling loci. A commonly-used RRS analysis pipeline, Stacks, has this capacity and so it is timely to compare its utility with existing software originally designed for alignment and analysis of whole genome sequencing data. Here we examine population genetic inferences from two species for which reference-aligned reduced-representation data have been collected. Our two study species are a threatened Australian marsupial (Tasmanian devil Sarcophilus harrisii; declining population) and an Arctic-circle migrant bird (pink-footed goose Anser brachyrhynchus; expanding population). Analyses of these data are compared using Stacks versus two widely-used genomics packages, SAMtools and GATK. We also introduce a custom R script to improve the reliability of single nucleotide polymorphism (SNP) calls in all pipelines and conduct population genetic inferences for non-model species with reference genomes. Results Although we identified orders of magnitude fewer SNPs in our devil dataset than for goose, we found remarkable symmetry between the two species in our assessment of software performance. For both datasets, all three methods were able to delineate population structure, even with varying numbers of loci. For both species, population structure inferences were influenced by the percent of missing data. Conclusions For studies of non-model species with a reference genome, we recommend combining Stacks output with further filtering (as included in our R pipeline) for population genetic studies, paying particular attention to potential impact of missing data thresholds. We recognise SAMtools as a viable alternative for researchers more familiar with this software. We caution against the use of GATK in studies with limited computational resources or time.
format Article in Journal/Newspaper
author Wright, Belinda
Farquharson, Katherine
McLennan, Elspeth
Belov, Katherine
Hogg, Carolyn
Grueber, Catherine
author_facet Wright, Belinda
Farquharson, Katherine
McLennan, Elspeth
Belov, Katherine
Hogg, Carolyn
Grueber, Catherine
author_sort Wright, Belinda
title From reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species
title_short From reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species
title_full From reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species
title_fullStr From reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species
title_full_unstemmed From reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species
title_sort from reference genomes to population genomics: comparing three reference-aligned reduced-representation sequencing pipelines in two wildlife species
publisher Figshare
publishDate 2019
url https://dx.doi.org/10.6084/m9.figshare.c.4528061.v1
https://springernature.figshare.com/collections/From_reference_genomes_to_population_genomics_comparing_three_reference-aligned_reduced-representation_sequencing_pipelines_in_two_wildlife_species/4528061/1
geographic Arctic
geographic_facet Arctic
genre Anser brachyrhynchus
Arctic
Pink-footed Goose
genre_facet Anser brachyrhynchus
Arctic
Pink-footed Goose
op_relation https://dx.doi.org/10.1186/s12864-019-5806-y
https://dx.doi.org/10.6084/m9.figshare.c.4528061
op_rights CC BY 4.0
https://creativecommons.org/licenses/by/4.0
op_rightsnorm CC-BY
op_doi https://doi.org/10.6084/m9.figshare.c.4528061.v1
https://doi.org/10.1186/s12864-019-5806-y
https://doi.org/10.6084/m9.figshare.c.4528061
_version_ 1766004769624686592