VLF: An R package for the analysis of very low frequency variants in DNA sequences

Here, we introduce VLF, an R package to determine the distribution of very low frequency variants (VLFs) in nucleotide and amino acid sequences for the analysis of errors in DNA sequence records. The package allows users to assess VLFs in aligned and trimmed protein-coding sequences by automatically...

Full description

Bibliographic Details
Main Authors: Phillips, Jarrett, Athey, Taryn, McNicholas, Paul, Hanner, Robert
Format: Article in Journal/Newspaper
Language:unknown
Published: Pensoft Publishers 2023
Subjects:
Online Access:https://zenodo.org/record/7578612
https://doi.org/10.3897/BDJ.11.e96480
id ftzenodo:oai:zenodo.org:7578612
record_format openpolar
spelling ftzenodo:oai:zenodo.org:7578612 2023-05-15T15:46:21+02:00 VLF: An R package for the analysis of very low frequency variants in DNA sequences Phillips, Jarrett Athey, Taryn McNicholas, Paul Hanner, Robert 2023-01-26 https://zenodo.org/record/7578612 https://doi.org/10.3897/BDJ.11.e96480 unknown Pensoft Publishers doi:10.3897/BDJ.11.e96480.figure9 doi:10.3897/BDJ.11.e96480.figure1 doi:10.3897/BDJ.11.e96480.figure2 doi:10.3897/BDJ.11.e96480.figure3 doi:10.3897/BDJ.11.e96480.figure4 doi:10.3897/BDJ.11.e96480.figure5 doi:10.3897/BDJ.11.e96480.figure6 doi:10.3897/BDJ.11.e96480.figure7 doi:10.3897/BDJ.11.e96480.figure8 doi:10.3897/BDJ.11.e96480.suppl2 doi:10.3897/BDJ.11.e96480.suppl1 https://zenodo.org/communities/biosyslit https://zenodo.org/record/7578612 https://doi.org/10.3897/BDJ.11.e96480 oai:zenodo.org:7578612 info:eu-repo/semantics/openAccess https://creativecommons.org/licenses/by/4.0/legalcode Biodiversity Data Journal 11 e96480 DNA barcoding frequency matrix genetic diversity PCR error sequencing error trace file info:eu-repo/semantics/article publication-article 2023 ftzenodo https://doi.org/10.3897/BDJ.11.e9648010.3897/BDJ.11.e96480.figure910.3897/BDJ.11.e96480.figure110.3897/BDJ.11.e96480.figure210.3897/BDJ.11.e96480.figure310.3897/BDJ.11.e96480.figure410.3897/BDJ.11.e96480.figure510.3897/BDJ.11.e96480.figure610.3897/BDJ.11. 2023-03-10T22:41:21Z Here, we introduce VLF, an R package to determine the distribution of very low frequency variants (VLFs) in nucleotide and amino acid sequences for the analysis of errors in DNA sequence records. The package allows users to assess VLFs in aligned and trimmed protein-coding sequences by automatically calculating the frequency of nucleotides or amino acids in each sequence position and outputting those that occur under a user-specified frequency (default of p = 0.001). These results can then be used to explore fundamental population genetic and phylogeographic patterns, mechanisms and processes at the microevolutionary level, such as nucleotide and amino acid sequence conservation.Our package extends earlier work pertaining to an implementation of VLF analysis in Microsoft Excel, which was found to be both computationally slow and error prone. We compare those results to our own herein. Results between the two implementations are found to be highly consistent for a large DNA barcode dataset of bird species. Differences in results are readily explained by both manual human error and inadequate Linnean taxonomy (specifically, species synonymy). Here, VLF is also applied to a subset of avian barcodes to assess the extent of biological artifacts at the species level for Canada goose (Branta canadensis), as well as within a large dataset of DNA barcodes for fishes of forensic and regulatory importance. The novelty of VLF and its benefit over the previous implementation include its high level of automation, speed, scalability and ease-of-use, each desirable characteristics which will be extremely valuable as more sequence data are rapidly accumulated in popular reference databases, such as BOLD and GenBank. Article in Journal/Newspaper Branta canadensis Canada Goose Zenodo Canada
institution Open Polar
collection Zenodo
op_collection_id ftzenodo
language unknown
topic DNA barcoding
frequency matrix
genetic diversity
PCR error
sequencing error
trace file
spellingShingle DNA barcoding
frequency matrix
genetic diversity
PCR error
sequencing error
trace file
Phillips, Jarrett
Athey, Taryn
McNicholas, Paul
Hanner, Robert
VLF: An R package for the analysis of very low frequency variants in DNA sequences
topic_facet DNA barcoding
frequency matrix
genetic diversity
PCR error
sequencing error
trace file
description Here, we introduce VLF, an R package to determine the distribution of very low frequency variants (VLFs) in nucleotide and amino acid sequences for the analysis of errors in DNA sequence records. The package allows users to assess VLFs in aligned and trimmed protein-coding sequences by automatically calculating the frequency of nucleotides or amino acids in each sequence position and outputting those that occur under a user-specified frequency (default of p = 0.001). These results can then be used to explore fundamental population genetic and phylogeographic patterns, mechanisms and processes at the microevolutionary level, such as nucleotide and amino acid sequence conservation.Our package extends earlier work pertaining to an implementation of VLF analysis in Microsoft Excel, which was found to be both computationally slow and error prone. We compare those results to our own herein. Results between the two implementations are found to be highly consistent for a large DNA barcode dataset of bird species. Differences in results are readily explained by both manual human error and inadequate Linnean taxonomy (specifically, species synonymy). Here, VLF is also applied to a subset of avian barcodes to assess the extent of biological artifacts at the species level for Canada goose (Branta canadensis), as well as within a large dataset of DNA barcodes for fishes of forensic and regulatory importance. The novelty of VLF and its benefit over the previous implementation include its high level of automation, speed, scalability and ease-of-use, each desirable characteristics which will be extremely valuable as more sequence data are rapidly accumulated in popular reference databases, such as BOLD and GenBank.
format Article in Journal/Newspaper
author Phillips, Jarrett
Athey, Taryn
McNicholas, Paul
Hanner, Robert
author_facet Phillips, Jarrett
Athey, Taryn
McNicholas, Paul
Hanner, Robert
author_sort Phillips, Jarrett
title VLF: An R package for the analysis of very low frequency variants in DNA sequences
title_short VLF: An R package for the analysis of very low frequency variants in DNA sequences
title_full VLF: An R package for the analysis of very low frequency variants in DNA sequences
title_fullStr VLF: An R package for the analysis of very low frequency variants in DNA sequences
title_full_unstemmed VLF: An R package for the analysis of very low frequency variants in DNA sequences
title_sort vlf: an r package for the analysis of very low frequency variants in dna sequences
publisher Pensoft Publishers
publishDate 2023
url https://zenodo.org/record/7578612
https://doi.org/10.3897/BDJ.11.e96480
geographic Canada
geographic_facet Canada
genre Branta canadensis
Canada Goose
genre_facet Branta canadensis
Canada Goose
op_source Biodiversity Data Journal 11 e96480
op_relation doi:10.3897/BDJ.11.e96480.figure9
doi:10.3897/BDJ.11.e96480.figure1
doi:10.3897/BDJ.11.e96480.figure2
doi:10.3897/BDJ.11.e96480.figure3
doi:10.3897/BDJ.11.e96480.figure4
doi:10.3897/BDJ.11.e96480.figure5
doi:10.3897/BDJ.11.e96480.figure6
doi:10.3897/BDJ.11.e96480.figure7
doi:10.3897/BDJ.11.e96480.figure8
doi:10.3897/BDJ.11.e96480.suppl2
doi:10.3897/BDJ.11.e96480.suppl1
https://zenodo.org/communities/biosyslit
https://zenodo.org/record/7578612
https://doi.org/10.3897/BDJ.11.e96480
oai:zenodo.org:7578612
op_rights info:eu-repo/semantics/openAccess
https://creativecommons.org/licenses/by/4.0/legalcode
op_doi https://doi.org/10.3897/BDJ.11.e9648010.3897/BDJ.11.e96480.figure910.3897/BDJ.11.e96480.figure110.3897/BDJ.11.e96480.figure210.3897/BDJ.11.e96480.figure310.3897/BDJ.11.e96480.figure410.3897/BDJ.11.e96480.figure510.3897/BDJ.11.e96480.figure610.3897/BDJ.11.
_version_ 1766381047754260480