Gene and repeat annotation for snowy owl (Bubo scandiacus) and selected species

Here we provide the gene and repeat annotation for snowy owl ( Bubo scandiacus ), in addition to gene and repeat annotation done for some species this was compared to. It is unfortunately currently not possible to upload repeat annotation tracks to an international nucleotide sequence database such...

Full description

Bibliographic Details
Main Authors: Baalsrud, Helle Tessand, Tørresen, Ole Kristian
Format: Other/Unknown Material
Language:unknown
Published: Zenodo 2024
Subjects:
Online Access:https://doi.org/10.5281/zenodo.12643816
Description
Summary:Here we provide the gene and repeat annotation for snowy owl ( Bubo scandiacus ), in addition to gene and repeat annotation done for some species this was compared to. It is unfortunately currently not possible to upload repeat annotation tracks to an international nucleotide sequence database such as ENA. While uploading the gene annotation is possible, some of the cross references to different databases in the functional annotation are removed. Further, the names of the entries in the publicly available genome assemblies on ENA have different names than what is found in the annotation tracks here, so we also provide the FASTA files for the snowy owl assemblies (bBubSca1.1.hap1.fasta.gz and bBubSca1.1.hap2.fasta.gz). Ideally, all this should have been available via ENA. We annotated the snowy owl genome assemblies, in addition to downy woodpecker ( Dryobates pubescens GCA_014839835.1 ), Northern Carmine bee-eater ( Merops nubicus GCA_009819595.1 ), Northern goshawk ( Accipiter gentilis GCA_929443795.2 ) and barn owl ( Tyto alba GCF_018691265.1 ), since no genome annotation was publicly available for these species. We used a pre-release version of the EBP-Nor genome annotation pipeline ( https://github.com/ebp-nor/GenomeAnnotation ). First, AGAT (https://zenodo.org/record/7255559) agat_sp_keep_longest_isoform.pl and agat_sp_extract_sequences.pl were used on the GRCg7b (GCA_016699485.1) chicken genome assembly and annotation to generate one protein (the longest isoform) per gene. Miniprot (Li, 2023) was used to align the proteins to the curated assemblies. UniProtKB/Swiss-Prot (Consortium et al., 2022) release 2022_03 in addition to the vertebrata part of OrthoDB v11 (Kuznetsov et al., 2022) were also aligned separately to the assemblies. Red (Girgis, 2015) was run via redmask ( https://github.com/nextgenusfs/redmask ) on the snowy owl assemblies to mask repetitive areas (we used the soft-masked genome assemblies available at NCBI for the other species). GALBA (Brůna et al., 2023; Buchfink et al., 2015; ...