Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ...

The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that ar...

Full description

Bibliographic Details
Main Authors: Tørresen, Ole K, Star, Bastiaan, Mier, Pablo, Andrade-Navarro, Miguel A, Bateman, Alex, Jarnot, Patryk, Gruca, Aleksandra, Grynberg, Marcin, Kajava, Andrey V, Promponas, Vasilis J, Anisimova, Maria, Jakobsen, Kjetill S, Linke, Dirk
Format: Text
Language:English
Published: Oxford University Press 2019
Subjects:
Online Access:https://dx.doi.org/10.21256/zhaw-18481
https://digitalcollection.zhaw.ch/handle/11475/18481
id ftdatacite:10.21256/zhaw-18481
record_format openpolar
spelling ftdatacite:10.21256/zhaw-18481 2024-09-15T17:55:30+00:00 Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ... Tørresen, Ole K Star, Bastiaan Mier, Pablo Andrade-Navarro, Miguel A Bateman, Alex Jarnot, Patryk Gruca, Aleksandra Grynberg, Marcin Kajava, Andrey V Promponas, Vasilis J Anisimova, Maria Jakobsen, Kjetill S Linke, Dirk 2019 application/pdf https://dx.doi.org/10.21256/zhaw-18481 https://digitalcollection.zhaw.ch/handle/11475/18481 en eng Oxford University Press Creative Commons Attribution 4.0 International https://creativecommons.org/licenses/by/4.0/legalcode cc-by-4.0 Genomics Bioinformatics FOS Computer and information sciences 572 Biochemie article-journal Journal Article Text ScholarlyArticle 2019 ftdatacite https://doi.org/10.21256/zhaw-18481 2024-07-03T10:39:58Z The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other ... Text atlantic cod DataCite
institution Open Polar
collection DataCite
op_collection_id ftdatacite
language English
topic Genomics
Bioinformatics
FOS Computer and information sciences
572 Biochemie
spellingShingle Genomics
Bioinformatics
FOS Computer and information sciences
572 Biochemie
Tørresen, Ole K
Star, Bastiaan
Mier, Pablo
Andrade-Navarro, Miguel A
Bateman, Alex
Jarnot, Patryk
Gruca, Aleksandra
Grynberg, Marcin
Kajava, Andrey V
Promponas, Vasilis J
Anisimova, Maria
Jakobsen, Kjetill S
Linke, Dirk
Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ...
topic_facet Genomics
Bioinformatics
FOS Computer and information sciences
572 Biochemie
description The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other ...
format Text
author Tørresen, Ole K
Star, Bastiaan
Mier, Pablo
Andrade-Navarro, Miguel A
Bateman, Alex
Jarnot, Patryk
Gruca, Aleksandra
Grynberg, Marcin
Kajava, Andrey V
Promponas, Vasilis J
Anisimova, Maria
Jakobsen, Kjetill S
Linke, Dirk
author_facet Tørresen, Ole K
Star, Bastiaan
Mier, Pablo
Andrade-Navarro, Miguel A
Bateman, Alex
Jarnot, Patryk
Gruca, Aleksandra
Grynberg, Marcin
Kajava, Andrey V
Promponas, Vasilis J
Anisimova, Maria
Jakobsen, Kjetill S
Linke, Dirk
author_sort Tørresen, Ole K
title Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ...
title_short Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ...
title_full Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ...
title_fullStr Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ...
title_full_unstemmed Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ...
title_sort tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases ...
publisher Oxford University Press
publishDate 2019
url https://dx.doi.org/10.21256/zhaw-18481
https://digitalcollection.zhaw.ch/handle/11475/18481
genre atlantic cod
genre_facet atlantic cod
op_rights Creative Commons Attribution 4.0 International
https://creativecommons.org/licenses/by/4.0/legalcode
cc-by-4.0
op_doi https://doi.org/10.21256/zhaw-18481
_version_ 1810431777495318528