Creating sense from non-sense DNA: de novo genesis and evolutionary history of antifreeze glycoprotein gene in northern cod fishes (gadidae)

Gadids (cod fishes) are ecologically prominent and important species of the north polar marine fauna. Seven gadid species are known to have evolved antifreeze glycoproteins (AFGPs) that enable their survival from inoculative freezing by environmental ice in frigid Arctic and sub-Arctic waters. AFGPs...

Full description

Bibliographic Details
Main Author: Zhuang, Xuan
Other Authors: Cheng, Chi-Hing C., DeVries, Arthur L., Olsen, Gary J., Kwast, Kurt E.
Language:English
Published: 2013
Subjects:
Online Access:http://hdl.handle.net/2142/46946
Description
Summary:Gadids (cod fishes) are ecologically prominent and important species of the north polar marine fauna. Seven gadid species are known to have evolved antifreeze glycoproteins (AFGPs) that enable their survival from inoculative freezing by environmental ice in frigid Arctic and sub-Arctic waters. AFGPs protect the fish from freezing death by binding to ice crystals and arresting ice growth within the fish. How and from where this important new genetic trait evolved in the gadid genome remained unknown, as thus far there are no homologous sequences in databases to infer genetic ancestry based on sequence homology. Additionally, the seven AFGP-bearing species are dispersed in separate clades of the gadid phylogeny, indicating a complicated evolutionary history of AFGP in the gadid lineage. The AFGP loci in three gadid species were isolated and characterized to investigate the genetic origin and evolutionary pathway of the AFGP gene family in the gadid lineage. Two of these species are polar cod Boreogadus saida and Atlantic tomcod Microgadus tomcod, which occur in distinct clades representing two possible separate origins of the gadid AFGP. A large-insert DNA bacterial artificial chromosome (BAC) library for each species was constructed and screened to isolate the AFGP-containing clones. Nine BAC clones of B. saida were selected by fingerprint analyses and sequenced on Roche 454 GS FLX+. Sequence assembly produced three AFGP gene clusters totaling 16 AFGPs, spanning a combined distance of ≥ 500 kbp. The M. tomcod AFGP locus is much smaller (approximately 80 kbp), residing within a single BAC clone and contains four AFGPs. A third AFGP-bearing gadid, Atlantic cod Gadus morhua, was included when its genome sequence was published. G. morhua and B. saida belong to the same clade, thus the G. morhua AFGP genomic locus would provide a useful added dimension in analyzing gadid AFGP evolution. Strangely however, the published G. morhua genome was missing the AFGP locus because the repetitive AFGP coding sequences (cds) were excluded along with the presumed gene-less simple repeats prior to genome assembly. The complete G. morhua AFGP locus was reconstructed in this study by sequencing the pertinent BAC clones obtained from the publically available BAC library. The G. morhua AFGP locus contains seven AFGPs, orthologous to AFGP cluster I in B. saida, suggesting this locus evolved in their common ancestor. The AFGP family sizes of these three species correlate with the severity of the environmental selective pressure in their habitats. The complete gene structures of these gadid AFGPs were determined, inclusive of the signal peptide cds, which was mis-annotated previously. AFGP genes have three exons and two introns. The approximately 400 bp 5’ portion is non-repetitive, comprising exons 1 through intron 2, with the signal peptide cds located in exon 1 and exon 2. Exon 3 encodes the long AFGP (Thr-Ala-Ala)n repeats, which are from 500 bp to 5,000 bp in length in different genes. The threonine in the tripeptide repeats are occasionally substituted by arginine or lysine, which are the putative cleavage sites of the AFGP polyprotein. Most intact AFGPs in the three species have one pair of 27-nucleotide (nt) repeats flanking each end of the AFGP tripeptide repeat cds, which supports the hypothesis that de novo duplication of the 9-nt element in the middle created the repetitive AFGP cds. Comparison of these AFGPs with non-coding AFGP homologous sequences in three gadid species that lack AFGP reveals that a putative promoter region was only present in the species most closely related to the AFGP-bearing gadids, while the homologs of the start codon and signal peptide are conserved in all three species, including a basal freshwater cod. Thus we deduce the process of non-coding to coding DNA evolution of gadid AFGP as follows. The recruitment of an upstream regulatory sequence made possible the transcription of a non-coding region that encompasses the prescursor of the signal peptide cds, and four copies of a 27-nt repeat containing a short stretch of GCA (encodes Ala) tri-nucleotide repeats in its center. A single nucleotide mutation changed one of the Ala codons to a Thr codon (ACA), creating the coding element for Thr-Ala-Ala, the tripeptide repeat of AFGP. An additional indel rendered the precursor signal peptide cds in-frame with the tripeptide coding element. Thus functionalized, the primordial gene became transcribed and translated into the novel, secreted protein with rudimentary ice-binding activities. Under selection by marine glaciation the tripeptide repeat coding element became extensively duplicated, creating the AFGP repetitive tripeptide cds and full-fledged antifreeze function. The AFGP loci of G. morhua and of M. tomcod exhibit high sequence identities (85%-99%) to B. saida AFGP cluster I, indicating they are homologous genomic regions. This shared microsynteny indicates that the AFGP gene of the phylogenetically separated AFGP-bearing clades of gadids evolved from the same genomic region. The two additional AFGP clusters in B. saida indicate further expansion of the AFGP gene family in response to its harsh habitats in the chronically frigid high Arctic seas. The sequence analysis reveals that the two AFGPs in B. saida AFGP cluster II were likely created as an insertion in an originally AFGP-lacking region, resulting from a segmental duplication of a region containing the last two genes in cluster III, and all seven genes in cluster III were derived from one of the AFGPs in cluster I. The phylogenetic reconstruction of all AFGPs in three gadids indicates orthologous gene members in B. saida and G. morhua (same clade species), but not in M. tomcod (separate clade), suggesting independent gene family expansion in different clades, which is likely the result of encountering temporally separated selective forces from different glacial advances and retreats in the northern hemisphere. The study of the genomic origin and molecular mechanism of AFGP evolution in gadid fishes tests the intriguing hypothesis of sense from non-sense DNA evolution creating a quintessential life-saving gene. As far as we know, this is the first clear example of a truly de novo origination of an entire gene with a well-known and major adaptive function that can be directly linked to evolutionary demand imposed by natural selection on the organism. Additionally, analyses on how the gadid AFGP genomic loci expanded shed light on how functional genes proliferate into a large family when driven by natural selection.