ModEst - Precise estimation of genome size from NGS data
Accurate estimates of genome sizes are important parameters for both theoretical and practical biodiversity genomics. We present here a fast, easy-to-implement and precise method to estimate genome size from the number of bases sequenced and the mean sequencing depth. To estimate the latter, we take...
Main Authors: | , , |
---|---|
Format: | Article in Journal/Newspaper |
Language: | unknown |
Published: |
Zenodo
2022
|
Subjects: | |
Online Access: | https://dx.doi.org/10.5281/zenodo.5903272 https://zenodo.org/record/5903272 |
id |
ftdatacite:10.5281/zenodo.5903272 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.5281/zenodo.5903272 2023-05-15T18:15:52+02:00 ModEst - Precise estimation of genome size from NGS data Schell, Tilman Pfenninger, Markus Schönnenbeck, Philipp 2022 https://dx.doi.org/10.5281/zenodo.5903272 https://zenodo.org/record/5903272 unknown Zenodo https://zenodo.org/communities/dryad https://dx.doi.org/10.5061/dryad.dr7sqvb0j https://dx.doi.org/10.5281/zenodo.5903271 https://zenodo.org/communities/dryad Open Access MIT License https://opensource.org/licenses/MIT mit info:eu-repo/semantics/openAccess MIT genome size simulation article Software SoftwareSourceCode 2022 ftdatacite https://doi.org/10.5281/zenodo.5903272 https://doi.org/10.5061/dryad.dr7sqvb0j https://doi.org/10.5281/zenodo.5903271 2022-02-09T13:46:27Z Accurate estimates of genome sizes are important parameters for both theoretical and practical biodiversity genomics. We present here a fast, easy-to-implement and precise method to estimate genome size from the number of bases sequenced and the mean sequencing depth. To estimate the latter, we take advantage of the fact that a precise estimation of the Poisson distribution parameter lambda is possible from truncated data, restricted to the part of the sequencing depth distribution representing the true underlying distribution. With simulations we could show that reasonable genome size estimates can be gained even from low-coverage (10X), highly discontinuous genome drafts. Comparison of estimates from a wide range of taxa and sequencing strategies with flow-cytometry estimates of the same individuals showed a very good fit and suggested that both methods yield comparable, interchangeable results. : To illustrate the influence of factors like sequencing depth, genome size, repeat content and -distribution on the different genome size estimation methods, we simulated five different genomes according to real examples. The latest genome assemblies and annotations of Saccharomyces cerevisae, Caenorhabditis elegans, Arabidopsis thaliana, Drosophila melanogaster and Scophthalmus maximus were used to obtain distributions of size and distance between annotated repeat regions. Simulated genomes of the size of the five genome assemblies mentioned above were then created using a custom Python-tool, available at https://github.com/Croxa/Simulate-Genome. Regions annotated as repeat regions (rr) were filled with random repeat units up to 10 bp length, high complexity regions with random nucleotides. For sake of ease, we simulated the genomes on a single chromosome. A mean GC content of 0.5 was applied to both categories. Article in Journal/Newspaper Scophthalmus maximus DataCite Metadata Store (German National Library of Science and Technology) Lambda ENVELOPE(-62.983,-62.983,-64.300,-64.300) |
institution |
Open Polar |
collection |
DataCite Metadata Store (German National Library of Science and Technology) |
op_collection_id |
ftdatacite |
language |
unknown |
topic |
genome size simulation |
spellingShingle |
genome size simulation Schell, Tilman Pfenninger, Markus Schönnenbeck, Philipp ModEst - Precise estimation of genome size from NGS data |
topic_facet |
genome size simulation |
description |
Accurate estimates of genome sizes are important parameters for both theoretical and practical biodiversity genomics. We present here a fast, easy-to-implement and precise method to estimate genome size from the number of bases sequenced and the mean sequencing depth. To estimate the latter, we take advantage of the fact that a precise estimation of the Poisson distribution parameter lambda is possible from truncated data, restricted to the part of the sequencing depth distribution representing the true underlying distribution. With simulations we could show that reasonable genome size estimates can be gained even from low-coverage (10X), highly discontinuous genome drafts. Comparison of estimates from a wide range of taxa and sequencing strategies with flow-cytometry estimates of the same individuals showed a very good fit and suggested that both methods yield comparable, interchangeable results. : To illustrate the influence of factors like sequencing depth, genome size, repeat content and -distribution on the different genome size estimation methods, we simulated five different genomes according to real examples. The latest genome assemblies and annotations of Saccharomyces cerevisae, Caenorhabditis elegans, Arabidopsis thaliana, Drosophila melanogaster and Scophthalmus maximus were used to obtain distributions of size and distance between annotated repeat regions. Simulated genomes of the size of the five genome assemblies mentioned above were then created using a custom Python-tool, available at https://github.com/Croxa/Simulate-Genome. Regions annotated as repeat regions (rr) were filled with random repeat units up to 10 bp length, high complexity regions with random nucleotides. For sake of ease, we simulated the genomes on a single chromosome. A mean GC content of 0.5 was applied to both categories. |
format |
Article in Journal/Newspaper |
author |
Schell, Tilman Pfenninger, Markus Schönnenbeck, Philipp |
author_facet |
Schell, Tilman Pfenninger, Markus Schönnenbeck, Philipp |
author_sort |
Schell, Tilman |
title |
ModEst - Precise estimation of genome size from NGS data |
title_short |
ModEst - Precise estimation of genome size from NGS data |
title_full |
ModEst - Precise estimation of genome size from NGS data |
title_fullStr |
ModEst - Precise estimation of genome size from NGS data |
title_full_unstemmed |
ModEst - Precise estimation of genome size from NGS data |
title_sort |
modest - precise estimation of genome size from ngs data |
publisher |
Zenodo |
publishDate |
2022 |
url |
https://dx.doi.org/10.5281/zenodo.5903272 https://zenodo.org/record/5903272 |
long_lat |
ENVELOPE(-62.983,-62.983,-64.300,-64.300) |
geographic |
Lambda |
geographic_facet |
Lambda |
genre |
Scophthalmus maximus |
genre_facet |
Scophthalmus maximus |
op_relation |
https://zenodo.org/communities/dryad https://dx.doi.org/10.5061/dryad.dr7sqvb0j https://dx.doi.org/10.5281/zenodo.5903271 https://zenodo.org/communities/dryad |
op_rights |
Open Access MIT License https://opensource.org/licenses/MIT mit info:eu-repo/semantics/openAccess |
op_rightsnorm |
MIT |
op_doi |
https://doi.org/10.5281/zenodo.5903272 https://doi.org/10.5061/dryad.dr7sqvb0j https://doi.org/10.5281/zenodo.5903271 |
_version_ |
1766189104193601536 |