VLF: An R package for the analysis of very low frequency variants in DNA sequences
Here, we introduce VLF, an R package to determine the distribution of very low frequency variants (VLFs) in nucleotide and amino acid sequences for the analysis of errors in DNA sequence records. The package allows users to assess VLFs in aligned and trimmed protein-coding sequences by automatically...
Published in: | Biodiversity Data Journal |
---|---|
Main Authors: | , , , |
Format: | Article in Journal/Newspaper |
Language: | English |
Published: |
Pensoft Publishers
2023
|
Subjects: | |
Online Access: | https://doi.org/10.3897/BDJ.11.e96480 https://doaj.org/article/f78a859ff600433b853ad3e9f3a780a3 |
id |
ftdoajarticles:oai:doaj.org/article:f78a859ff600433b853ad3e9f3a780a3 |
---|---|
record_format |
openpolar |
spelling |
ftdoajarticles:oai:doaj.org/article:f78a859ff600433b853ad3e9f3a780a3 2023-05-15T15:46:21+02:00 VLF: An R package for the analysis of very low frequency variants in DNA sequences Jarrett Phillips Taryn Athey Paul McNicholas Robert Hanner 2023-01-01T00:00:00Z https://doi.org/10.3897/BDJ.11.e96480 https://doaj.org/article/f78a859ff600433b853ad3e9f3a780a3 EN eng Pensoft Publishers https://bdj.pensoft.net/article/96480/download/pdf/ https://bdj.pensoft.net/article/96480/download/xml/ https://bdj.pensoft.net/article/96480/ https://doaj.org/toc/1314-2828 doi:10.3897/BDJ.11.e96480 1314-2828 https://doaj.org/article/f78a859ff600433b853ad3e9f3a780a3 Biodiversity Data Journal, Vol 11, Iss , Pp 1-30 (2023) DNA barcoding frequency matrix genetic diversity Biology (General) QH301-705.5 article 2023 ftdoajarticles https://doi.org/10.3897/BDJ.11.e96480 2023-01-29T01:25:48Z Here, we introduce VLF, an R package to determine the distribution of very low frequency variants (VLFs) in nucleotide and amino acid sequences for the analysis of errors in DNA sequence records. The package allows users to assess VLFs in aligned and trimmed protein-coding sequences by automatically calculating the frequency of nucleotides or amino acids in each sequence position and outputting those that occur under a user-specified frequency (default of p = 0.001). These results can then be used to explore fundamental population genetic and phylogeographic patterns, mechanisms and processes at the microevolutionary level, such as nucleotide and amino acid sequence conservation.Our package extends earlier work pertaining to an implementation of VLF analysis in Microsoft Excel, which was found to be both computationally slow and error prone. We compare those results to our own herein. Results between the two implementations are found to be highly consistent for a large DNA barcode dataset of bird species. Differences in results are readily explained by both manual human error and inadequate Linnean taxonomy (specifically, species synonymy). Here, VLF is also applied to a subset of avian barcodes to assess the extent of biological artifacts at the species level for Canada goose (Branta canadensis), as well as within a large dataset of DNA barcodes for fishes of forensic and regulatory importance. The novelty of VLF and its benefit over the previous implementation include its high level of automation, speed, scalability and ease-of-use, each desirable characteristics which will be extremely valuable as more sequence data are rapidly accumulated in popular reference databases, such as BOLD and GenBank. Article in Journal/Newspaper Branta canadensis Canada Goose Directory of Open Access Journals: DOAJ Articles Canada Biodiversity Data Journal 11 |
institution |
Open Polar |
collection |
Directory of Open Access Journals: DOAJ Articles |
op_collection_id |
ftdoajarticles |
language |
English |
topic |
DNA barcoding frequency matrix genetic diversity Biology (General) QH301-705.5 |
spellingShingle |
DNA barcoding frequency matrix genetic diversity Biology (General) QH301-705.5 Jarrett Phillips Taryn Athey Paul McNicholas Robert Hanner VLF: An R package for the analysis of very low frequency variants in DNA sequences |
topic_facet |
DNA barcoding frequency matrix genetic diversity Biology (General) QH301-705.5 |
description |
Here, we introduce VLF, an R package to determine the distribution of very low frequency variants (VLFs) in nucleotide and amino acid sequences for the analysis of errors in DNA sequence records. The package allows users to assess VLFs in aligned and trimmed protein-coding sequences by automatically calculating the frequency of nucleotides or amino acids in each sequence position and outputting those that occur under a user-specified frequency (default of p = 0.001). These results can then be used to explore fundamental population genetic and phylogeographic patterns, mechanisms and processes at the microevolutionary level, such as nucleotide and amino acid sequence conservation.Our package extends earlier work pertaining to an implementation of VLF analysis in Microsoft Excel, which was found to be both computationally slow and error prone. We compare those results to our own herein. Results between the two implementations are found to be highly consistent for a large DNA barcode dataset of bird species. Differences in results are readily explained by both manual human error and inadequate Linnean taxonomy (specifically, species synonymy). Here, VLF is also applied to a subset of avian barcodes to assess the extent of biological artifacts at the species level for Canada goose (Branta canadensis), as well as within a large dataset of DNA barcodes for fishes of forensic and regulatory importance. The novelty of VLF and its benefit over the previous implementation include its high level of automation, speed, scalability and ease-of-use, each desirable characteristics which will be extremely valuable as more sequence data are rapidly accumulated in popular reference databases, such as BOLD and GenBank. |
format |
Article in Journal/Newspaper |
author |
Jarrett Phillips Taryn Athey Paul McNicholas Robert Hanner |
author_facet |
Jarrett Phillips Taryn Athey Paul McNicholas Robert Hanner |
author_sort |
Jarrett Phillips |
title |
VLF: An R package for the analysis of very low frequency variants in DNA sequences |
title_short |
VLF: An R package for the analysis of very low frequency variants in DNA sequences |
title_full |
VLF: An R package for the analysis of very low frequency variants in DNA sequences |
title_fullStr |
VLF: An R package for the analysis of very low frequency variants in DNA sequences |
title_full_unstemmed |
VLF: An R package for the analysis of very low frequency variants in DNA sequences |
title_sort |
vlf: an r package for the analysis of very low frequency variants in dna sequences |
publisher |
Pensoft Publishers |
publishDate |
2023 |
url |
https://doi.org/10.3897/BDJ.11.e96480 https://doaj.org/article/f78a859ff600433b853ad3e9f3a780a3 |
geographic |
Canada |
geographic_facet |
Canada |
genre |
Branta canadensis Canada Goose |
genre_facet |
Branta canadensis Canada Goose |
op_source |
Biodiversity Data Journal, Vol 11, Iss , Pp 1-30 (2023) |
op_relation |
https://bdj.pensoft.net/article/96480/download/pdf/ https://bdj.pensoft.net/article/96480/download/xml/ https://bdj.pensoft.net/article/96480/ https://doaj.org/toc/1314-2828 doi:10.3897/BDJ.11.e96480 1314-2828 https://doaj.org/article/f78a859ff600433b853ad3e9f3a780a3 |
op_doi |
https://doi.org/10.3897/BDJ.11.e96480 |
container_title |
Biodiversity Data Journal |
container_volume |
11 |
_version_ |
1766381046677372928 |