Supplementary data for: Chromosome-level genome assembly and circadian gene repertoire of the Patagonia blennie Eleginops maclovinus

This dataset contains the genome assembly and associated annotation of the Patagonian Blennie (Eleginops maclovinus), the closest extant taxon to the Antarctic notothenioid radiation. In addition to the characterization of the E. maclovinus genome, the dataset includes a description of circadian rhy...

Full description

Bibliographic Details
Main Author: Rivera-Colón, Angel
Format: Software
Language:unknown
Published: 2023
Subjects:
Online Access:https://zenodo.org/record/7829978
https://doi.org/10.5281/zenodo.7829978
Description
Summary:This dataset contains the genome assembly and associated annotation of the Patagonian Blennie (Eleginops maclovinus), the closest extant taxon to the Antarctic notothenioid radiation. In addition to the characterization of the E. maclovinus genome, the dataset includes a description of circadian rhythm orthologs for E. maclovinus, other notothenenioid taxa, and teleost outgroups, as well as a copy of the bioinformatic scripts used for the assembly, annotation, and other downstream analysis. All assembly and annotation files are gzipped, but are otherwise standard bioinformatic formats (i.e., FASTA for genome assembly and coding/amino acid sequences, GTF for annotation, AGP for scaffolding). In addition, bioinformatic scripts for data generation and analysis are in Python (*.py) or Bash (*.sh, but might require the installation of additional, open-source software (e.g., wtdbg2, BRAKER) See links for a description of the FASTA (http://www.ncbi.nlm.nih.gov/blast/fasta.shtml), and GTF (https://useast.ensembl.org/info/website/upload/gff.html), and AGP (https://www.ncbi.nlm.nih.gov/assembly/agp/AGP_Specification/) file format specifications. File format Specification File Suffix1 Description *.fa Genome assembly in nucleotide FASTA format. *.agp Assembly structure in AGP format. *.gtf Genome annotation in GTF format. *.cds.fa Genomic sequence for all annotated protein-coding genes in nucleotide FASTA format. *.protein.fa Protein sequence for all annotated protein-coding genes in amino acid FASTA format. 1Does not include the gzipped compression suffix (*.gz).Funding provided by: National Science FoundationCrossref Funder Registry ID: http://dx.doi.org/10.13039/100000001Award Number: 1645087Funding provided by: National Science FoundationCrossref Funder Registry ID: http://dx.doi.org/10.13039/100000001Award Number: 11-42158 An E. maclovinus specimen was collected from the Puerto Natales, Chile in January 2018. HMW DNA was extracted and sequenced using PacBio Sequel II and a Hi-C library. A contig-level genome ...