Patterns of sequencing coverage bias revealed by ultra-deep sequencing of vertebrate mitochondria

Background: Genome and transcriptome sequencing applications that rely on variation in sequence depth can be negatively affected if there are systematic biases in coverage. We have investigated patterns of local variation in sequencing coverage by utilising ultra-deep sequencing (>100,000X) of mt...

Full description

Bibliographic Details
Published in:BMC Genomics
Main Authors: Ekblom, Robert, Smeds, Linnea, Ellegren, Hans
Format: Article in Journal/Newspaper
Language:English
Published: Uppsala universitet, Evolutionsbiologi 2014
Subjects:
SSE
Online Access:http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-228703
https://doi.org/10.1186/1471-2164-15-467
Description
Summary:Background: Genome and transcriptome sequencing applications that rely on variation in sequence depth can be negatively affected if there are systematic biases in coverage. We have investigated patterns of local variation in sequencing coverage by utilising ultra-deep sequencing (>100,000X) of mtDNA obtained during sequencing of two vertebrate genomes, wolverine (Gulo gulo) and collared flycatcher (Ficedula albicollis). With such extreme depth, stochastic variation in coverage should be negligible, which allows us to provide a very detailed, fine-scale picture of sequence dependent coverage variation and sequencing error rates. Results: Sequencing coverage showed up to six-fold variation across the complete mtDNA and this variation was highly repeatable in sequencing of multiple individuals of the same species. Moreover, coverage in orthologous regions was correlated between the two species and was negatively correlated with GC content. We also found a negative correlation between the site-specific sequencing error rate and coverage, with certain sequence motifs "CCNGCC" being particularly prone to high rates of error and low coverage. Conclusions: Our results demonstrate that inherent sequence characteristics govern variation in coverage and suggest that some of this variation, like GC content, should be controlled for in, for example, RNA-Seq and detection of copy number variation.