Visualize gene structure with exon-intron diagrams, domain annotations, and mutation position markers. Produces SVG, PNG, or PDF figures suitable for publication from a gene symbol input.
Generate exon-intron structure diagrams for any gene symbol using the Ensembl REST API. Optionally overlay protein domain annotations (UniProt) and mark mutation hotspot positions. Outputs publication-ready SVG, PNG, or PDF figures.
✅ IMPLEMENTED —
scripts/main.pyis fully functional. Ensembl REST API, caching, matplotlib visualization,--domains,--mutations, and--demoare all implemented.
python -m py_compile scripts/main.py
python scripts/main.py --help
python scripts/main.py --demo --output demo.png
Fallback template: If scripts/main.py fails or the gene symbol is unrecognized, report: (a) the failure point, (b) whether a manual Ensembl/UCSC lookup can substitute, (c) which output formats are still generatable.
| Parameter | Type | Required | Description |
|---|---|---|---|
--gene, -g | string | Yes* | Gene symbol or Ensembl ID (e.g., TP53, BRCA1, ENSG00000141510) |
--species | string | No | Species name for Ensembl lookup (default: homo_sapiens) |
--format | string | No | Output format: png, svg, pdf (default: png) |
--output, -o | string | No | Output file path (default: <gene>_structure.<format>) |
--domains | flag | No | Fetch and overlay UniProt protein domain annotations |
--mutations | string | No | Comma-separated codon positions to mark (e.g., 248,273) |
--demo | flag | No | Use hardcoded TP53 GRCh38 data — no internet required |
*Required unless --demo is used.
python scripts/main.py --gene TP53 --format png
python scripts/main.py --gene BRCA1 --format png --domains --output brca1_structure.png
python scripts/main.py --gene KRAS --mutations 12,13,61 --format pdf
python scripts/main.py --demo
python scripts/main.py --demo --output demo.png --format svg
The script must implement:
GET https://rest.ensembl.org/lookup/symbol/homo_sapiens/{gene}?expand=1 to fetch exon coordinates. Cache response to .cache/{gene}_ensembl.json to avoid repeated API calls. Add a 0.1 s delay between requests for batch lookups. The unauthenticated rate limit is 15 requests/second.Error: Gene not found: {gene_name}. Check the gene symbol and try again.matplotlib or svgwrite to draw exon blocks (filled rectangles) and intron lines scaled to genomic coordinates.--domains flag — fetch UniProt domain annotations and overlay colored domain blocks on the gene structure.--mutations flag — accept comma-separated codon positions; map to exon coordinates and draw vertical markers.--demo flag — use hardcoded TP53 GRCh38 exon coordinates (no internet required) to generate a demo visualization.is_canonical flag). Other isoforms are not visualized.--domains) maps UniProt amino acid positions to genomic coordinates using CDS length; accuracy may vary for genes with complex splicing..cache/{gene}_ensembl.json. Delete the cache file to force a fresh lookup.--domains)--mutations)--demo)Every response must make these explicit:
This skill accepts: gene symbol inputs for structure visualization, with optional domain and mutation overlays.
If the request does not involve gene structure visualization — for example, asking to perform sequence alignment, predict protein structure, or analyze expression data — do not proceed. Instead respond:
"
gene-structure-mapperis designed to visualize gene exon-intron structure. Your request appears to be outside this scope. Please provide a gene symbol and desired output format, or use a more appropriate tool for your task."
--gene is missing, state that the gene symbol is required and provide an example.Error: Gene not found: {gene_name}. Check the gene symbol and try again. and exit with code 1.--mutations contains non-numeric values, reject with: Error: --mutations must be comma-separated integers (codon positions).scripts/main.py fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.