Supplementary data for: Chromosome-level genome assembly and circadian gene repertoire of the Patagonia blennie Eleginops maclovinus

This dataset contains the genome assembly and associated annotation of the Patagonian Blennie ( Eleginops maclovinus ), the closest extant taxon to the Antarctic notothenioid radiation. In addition to the characterization of the E. maclovinus genome, the dataset includes a description of circadian r...

Full description

Bibliographic Details
Main Author: Rivera-Colón, Angel
Format: Other/Unknown Material
Language:unknown
Published: Zenodo 2023
Subjects:
Online Access:https://doi.org/10.5281/zenodo.7829978
Description
Summary:This dataset contains the genome assembly and associated annotation of the Patagonian Blennie ( Eleginops maclovinus ), the closest extant taxon to the Antarctic notothenioid radiation. In addition to the characterization of the E. maclovinus genome, the dataset includes a description of circadian rhythm orthologs for E. maclovinus , other notothenenioid taxa, and teleost outgroups, as well as a copy of the bioinformatic scripts used for the assembly, annotation, and other downstream analysis. All assembly and annotation files are gzipped, but are otherwise standard bioinformatic formats (i.e., FASTA for genome assembly and coding/amino acid sequences, GTF for annotation, AGP for scaffolding). In addition, bioinformatic scripts for data generation and analysis are in Python (*.py) or Bash (*.sh, but might require the installation of additional, open-source software (e.g., wtdbg2 , BRAKER ) See links for a description of the FASTA ( http://www.ncbi.nlm.nih.gov/blast/fasta.shtml ), and GTF ( https://useast.ensembl.org/info/website/upload/gff.html ), and AGP ( https://www.ncbi.nlm.nih.gov/assembly/agp/AGP_Specification/ ) file format specifications. File format Specification File Suffix 1 Description *.fa Genome assembly in nucleotide FASTA format. *.agp Assembly structure in AGP format. *.gtf Genome annotation in GTF format. *.cds.fa Genomic sequence for all annotated protein-coding genes in nucleotide FASTA format. *.protein.fa Protein sequence for all annotated protein-coding genes in amino acid FASTA format. 1 Does not include the gzipped compression suffix (*.gz). Funding provided by: National Science Foundation Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001 Award Number: 1645087 Funding provided by: National Science Foundation Crossref Funder Registry ID: http://dx.doi.org/10.13039/100000001 Award Number: 11-42158