Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome

Highly repetitive sequences are the bane of genome sequence assembly, and the short read lengths produced by current next generation sequencing technologies further exacerbates this obstacle. An adopted practice is to exclude repetitive sequences in genome data assembly, as the majority of repeats l...

Full description

Bibliographic Details
Published in:BMC Genomics
Main Authors: Zhuang, Xuan, Yang, Chun, Fevolden, Svein-Erik, Cheng, Chi-Hing Christina
Format: Article in Journal/Newspaper
Language:English
Published: 2012
Subjects:
Online Access:https://hdl.handle.net/10037/4993
https://doi.org/10.1186/1471-2164-13-293
id ftunivtroemsoe:oai:munin.uit.no:10037/4993
record_format openpolar
spelling ftunivtroemsoe:oai:munin.uit.no:10037/4993 2023-05-15T14:30:32+02:00 Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome Zhuang, Xuan Yang, Chun Fevolden, Svein-Erik Cheng, Chi-Hing Christina 2012 https://hdl.handle.net/10037/4993 https://doi.org/10.1186/1471-2164-13-293 eng eng BMC Genomics (2012), vol. 13 (293) FRIDAID 933303 http://dx.doi.org/10.1186/1471-2164-13-293 1471-2164 https://hdl.handle.net/10037/4993 URN:NBN:no-uit_munin_4688 openAccess VDP::Mathematics and natural science: 400::Basic biosciences: 470::Genetics and genomics: 474 VDP::Matematikk og Naturvitenskap: 400::Basale biofag: 470::Genetikk og genomikk: 474 Journal article Tidsskriftartikkel Peer reviewed 2012 ftunivtroemsoe https://doi.org/10.1186/1471-2164-13-293 2021-06-25T17:53:30Z Highly repetitive sequences are the bane of genome sequence assembly, and the short read lengths produced by current next generation sequencing technologies further exacerbates this obstacle. An adopted practice is to exclude repetitive sequences in genome data assembly, as the majority of repeats lack protein-coding genes. However, this could result in the exclusion of important genotypes in newly sequenced non-model species. The absence of the antifreeze glycoproteins (AFGP) gene family in the recently sequenced Atlantic cod genome serves as an example. The Atlantic cod (Gadus morhua) genome was assembled entirely from Roche 454 short reads, demonstrating the feasibility of this approach. However, a well-known major adaptive trait, the AFGP, essential for survival in frigid Arctic marine habitats was absent in the annotated genome. To assess whether this resulted from population difference, we performed Southern blot analysis of genomic DNA from multiple individuals from the North East Arctic cod population that the sequenced cod belonged, and verified that the AFGP genotype is indeed present. We searched the raw assemblies of the Atlantic cod using our G. morhua AFGP gene, and located partial AFGP coding sequences in two sequence scaffolds. We found these two scaffolds constitute a partial genomic AFGP locus through comparative sequence analyses with our newly assembled genomic AFGP locus of the related polar cod, Boreogadus saida. By examining the sequence assembly and annotation methodologies used for the Atlantic cod genome, we deduced the primary cause of the absence of the AFGP gene family from the annotated genome was the removal of all repetitive Roche 454 short reads before sequence assembly, which would exclude most of the highly repetitive AFGP coding sequences. Secondarily, the model teleost genomes used in projection annotation of the Atlantic cod genome have no antifreeze trait, perpetuating the unawareness that the AFGP gene family is missing. We recovered some of the missing AFGP coding sequences and reconstructed a partial AFGP locus in the Atlantic cod genome, bringing to light that not all repetitive sequences lack protein coding information. Also, reliance on genomes of model organisms as reference for annotating protein-coding gene content of a newly sequenced non-model species could lead to omission of novel genetic traits. Article in Journal/Newspaper Arctic cod Arctic atlantic cod Boreogadus saida Gadus morhua polar cod University of Tromsø: Munin Open Research Archive Arctic BMC Genomics 13 1 293
institution Open Polar
collection University of Tromsø: Munin Open Research Archive
op_collection_id ftunivtroemsoe
language English
topic VDP::Mathematics and natural science: 400::Basic biosciences: 470::Genetics and genomics: 474
VDP::Matematikk og Naturvitenskap: 400::Basale biofag: 470::Genetikk og genomikk: 474
spellingShingle VDP::Mathematics and natural science: 400::Basic biosciences: 470::Genetics and genomics: 474
VDP::Matematikk og Naturvitenskap: 400::Basale biofag: 470::Genetikk og genomikk: 474
Zhuang, Xuan
Yang, Chun
Fevolden, Svein-Erik
Cheng, Chi-Hing Christina
Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome
topic_facet VDP::Mathematics and natural science: 400::Basic biosciences: 470::Genetics and genomics: 474
VDP::Matematikk og Naturvitenskap: 400::Basale biofag: 470::Genetikk og genomikk: 474
description Highly repetitive sequences are the bane of genome sequence assembly, and the short read lengths produced by current next generation sequencing technologies further exacerbates this obstacle. An adopted practice is to exclude repetitive sequences in genome data assembly, as the majority of repeats lack protein-coding genes. However, this could result in the exclusion of important genotypes in newly sequenced non-model species. The absence of the antifreeze glycoproteins (AFGP) gene family in the recently sequenced Atlantic cod genome serves as an example. The Atlantic cod (Gadus morhua) genome was assembled entirely from Roche 454 short reads, demonstrating the feasibility of this approach. However, a well-known major adaptive trait, the AFGP, essential for survival in frigid Arctic marine habitats was absent in the annotated genome. To assess whether this resulted from population difference, we performed Southern blot analysis of genomic DNA from multiple individuals from the North East Arctic cod population that the sequenced cod belonged, and verified that the AFGP genotype is indeed present. We searched the raw assemblies of the Atlantic cod using our G. morhua AFGP gene, and located partial AFGP coding sequences in two sequence scaffolds. We found these two scaffolds constitute a partial genomic AFGP locus through comparative sequence analyses with our newly assembled genomic AFGP locus of the related polar cod, Boreogadus saida. By examining the sequence assembly and annotation methodologies used for the Atlantic cod genome, we deduced the primary cause of the absence of the AFGP gene family from the annotated genome was the removal of all repetitive Roche 454 short reads before sequence assembly, which would exclude most of the highly repetitive AFGP coding sequences. Secondarily, the model teleost genomes used in projection annotation of the Atlantic cod genome have no antifreeze trait, perpetuating the unawareness that the AFGP gene family is missing. We recovered some of the missing AFGP coding sequences and reconstructed a partial AFGP locus in the Atlantic cod genome, bringing to light that not all repetitive sequences lack protein coding information. Also, reliance on genomes of model organisms as reference for annotating protein-coding gene content of a newly sequenced non-model species could lead to omission of novel genetic traits.
format Article in Journal/Newspaper
author Zhuang, Xuan
Yang, Chun
Fevolden, Svein-Erik
Cheng, Chi-Hing Christina
author_facet Zhuang, Xuan
Yang, Chun
Fevolden, Svein-Erik
Cheng, Chi-Hing Christina
author_sort Zhuang, Xuan
title Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome
title_short Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome
title_full Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome
title_fullStr Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome
title_full_unstemmed Protein genes in repetitive sequence—antifreeze glycoproteins in Atlantic cod genome
title_sort protein genes in repetitive sequence—antifreeze glycoproteins in atlantic cod genome
publishDate 2012
url https://hdl.handle.net/10037/4993
https://doi.org/10.1186/1471-2164-13-293
geographic Arctic
geographic_facet Arctic
genre Arctic cod
Arctic
atlantic cod
Boreogadus saida
Gadus morhua
polar cod
genre_facet Arctic cod
Arctic
atlantic cod
Boreogadus saida
Gadus morhua
polar cod
op_relation BMC Genomics (2012), vol. 13 (293)
FRIDAID 933303
http://dx.doi.org/10.1186/1471-2164-13-293
1471-2164
https://hdl.handle.net/10037/4993
URN:NBN:no-uit_munin_4688
op_rights openAccess
op_doi https://doi.org/10.1186/1471-2164-13-293
container_title BMC Genomics
container_volume 13
container_issue 1
container_start_page 293
_version_ 1766304371093536768