Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ...
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that ar...
Main Authors: | , , , , , , , , , , , , |
---|---|
Format: | Text |
Language: | English |
Published: |
Oxford University Press
2019
|
Subjects: | |
Online Access: | https://dx.doi.org/10.21256/zhaw-18481 https://digitalcollection.zhaw.ch/handle/11475/18481 |
id |
ftdatacite:10.21256/zhaw-18481 |
---|---|
record_format |
openpolar |
spelling |
ftdatacite:10.21256/zhaw-18481 2024-09-15T17:55:30+00:00 Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ... Tørresen, Ole K Star, Bastiaan Mier, Pablo Andrade-Navarro, Miguel A Bateman, Alex Jarnot, Patryk Gruca, Aleksandra Grynberg, Marcin Kajava, Andrey V Promponas, Vasilis J Anisimova, Maria Jakobsen, Kjetill S Linke, Dirk 2019 application/pdf https://dx.doi.org/10.21256/zhaw-18481 https://digitalcollection.zhaw.ch/handle/11475/18481 en eng Oxford University Press Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 Genomics Bioinformatics FOS Computer and information sciences 572 Biochemie article-journal Journal Article Text ScholarlyArticle 2019 ftdatacite https://doi.org/10.21256/zhaw-18481 2024-07-03T10:39:58Z The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other ... Text atlantic cod DataCite |
institution |
Open Polar |
collection |
DataCite |
op_collection_id |
ftdatacite |
language |
English |
topic |
Genomics Bioinformatics FOS Computer and information sciences 572 Biochemie |
spellingShingle |
Genomics Bioinformatics FOS Computer and information sciences 572 Biochemie Tørresen, Ole K Star, Bastiaan Mier, Pablo Andrade-Navarro, Miguel A Bateman, Alex Jarnot, Patryk Gruca, Aleksandra Grynberg, Marcin Kajava, Andrey V Promponas, Vasilis J Anisimova, Maria Jakobsen, Kjetill S Linke, Dirk Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ... |
topic_facet |
Genomics Bioinformatics FOS Computer and information sciences 572 Biochemie |
description |
The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other ... |
format |
Text |
author |
Tørresen, Ole K Star, Bastiaan Mier, Pablo Andrade-Navarro, Miguel A Bateman, Alex Jarnot, Patryk Gruca, Aleksandra Grynberg, Marcin Kajava, Andrey V Promponas, Vasilis J Anisimova, Maria Jakobsen, Kjetill S Linke, Dirk |
author_facet |
Tørresen, Ole K Star, Bastiaan Mier, Pablo Andrade-Navarro, Miguel A Bateman, Alex Jarnot, Patryk Gruca, Aleksandra Grynberg, Marcin Kajava, Andrey V Promponas, Vasilis J Anisimova, Maria Jakobsen, Kjetill S Linke, Dirk |
author_sort |
Tørresen, Ole K |
title |
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ... |
title_short |
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ... |
title_full |
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ... |
title_fullStr |
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ... |
title_full_unstemmed |
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ... |
title_sort |
tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ... |
publisher |
Oxford University Press |
publishDate |
2019 |
url |
https://dx.doi.org/10.21256/zhaw-18481 https://digitalcollection.zhaw.ch/handle/11475/18481 |
genre |
atlantic cod |
genre_facet |
atlantic cod |
op_rights |
Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 |
op_doi |
https://doi.org/10.21256/zhaw-18481 |
_version_ |
1810431777495318528 |