Reference for Atlantic salmon GFF3 file structure (Ensembl and NCBI annotations on Ssal_v3.1). Use this when working with GFF files, parsing gene attributes, or writing scripts that process annotation data.
data/genomes/AtlanticSalmon/Ssal_v3.1_Ens.gff3 (plain text)data/genomes/AtlanticSalmon/Ssal_v3.1_NCBI.gff3 (plain text)gene — protein-coding genes (source: ensembl)ncRNA_gene — non-coding RNA genes (source: ensembl or ncrna)pseudogene — pseudogenes (source: ensembl)ID=gene:ENSSSAG00000110109;Name=ube2d3;biotype=protein_coding;description=ubiquitin-conjugating enzyme E2D 3 [Source:ZFIN%3BAcc:ZDB-GENE-030131-551];gene_id=ENSSSAG00000110109;logic_name=ensembl;version=1
ID — always prefixed with gene: (e.g., gene:ENSSSAG00000110109)Name — optional, only ~44% of genes have itbiotype — e.g., protein_coding, lncRNA, pseudogene, snRNA, snoRNA, miRNA, tRNA, rRNAdescription — URL-encoded, contains source in bracket suffix: [Source:ZFIN%3BAcc:ZDB-GENE-030131-551]gene_id — same as ID without gene: prefixlogic_name — ensembl or ncrna[Source:XXX;Acc:YYY])Name=, it always also has description=description= without Name=gene (no ncRNA_gene/pseudogene distinction at feature level)ID=gene-LOC123729278;Dbxref=GeneID:123729278;Name=LOC123729278;description=lethal(3)malignant brain tumor-like protein 2;gbkey=Gene;gene=LOC123729278;gene_biotype=protein_coding
ID — prefixed with gene- (e.g., gene-LOC123729278 or gene-wdr32)Name — always present, but often auto-assigned LOC IDsDbxref — contains GeneID:NNNNN (NCBI numeric gene ID)gene_biotype — (note: NOT biotype like Ensembl) e.g., protein_coding, lncRNA, tRNA, rRNA, snRNA, snoRNA, misc_RNAdescription — plain text, URL-encoded (e.g., %2C for comma)gene — same as Namegbkey — always GeneName= but most are auto-assigned LOC IDs (start with "LOC")wdr32, manba)is_loc = name.startswith("LOC")| Aspect | Ensembl | NCBI |
|---|---|---|
| Gene ID prefix | gene: | gene- |
| Gene feature types | gene, ncRNA_gene, pseudogene | gene only |
| Biotype attribute | biotype | gene_biotype |
| Name attribute | Optional (~44% have it) | Always present (but often LOC) |
| Numeric ID | In gene_id attribute | In Dbxref=GeneID: |
1, 2, ..., 29tax_id, "other" in Other_tax_idOther_tax_id column (never in tax_id)tax_id == 9606 & Other_tax_id == 8030GeneID = human gene, Other_GeneID = salmon gene#tax_id — needs renaming after read