Name: COMPUTE, DON'T DESCRIBE
Author: mims-harvard

COMPUTE, DON'T DESCRIBE

Perform comprehensive gene enrichment and pathway analysis using gseapy (ORA and GSEA), PANTHER, STRING, Reactome, and 40+ ToolUniverse tools. Supports GO enrichment (BP, MF, CC), KEGG, Reactome, WikiPathways, MSigDB Hallmark, and 220+ Enrichr libraries. Handles multiple ID types (gene symbols, Ensembl, Entrez, UniProt), multiple organisms (human, mouse, rat, fly, worm, yeast), customizable backgrounds, and multiple testing correction (BH, Bonferroni). Use when users ask about gene enrichment, pathway analysis, GO term enrichment, KEGG pathway analysis, GSEA, over-representation analysis, functional annotation, or gene set analysis.

mims-harvard1,271 Sterne29.03.2026

Beruf
Kategorien: Bioinformatik

COMPUTE, DON'T DESCRIBE

When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.

Gene Enrichment and Pathway Analysis

Perform comprehensive gene enrichment analysis including Gene Ontology (GO), KEGG, Reactome, WikiPathways, and MSigDB enrichment using both Over-Representation Analysis (ORA) and Gene Set Enrichment Analysis (GSEA). Integrates local computation via gseapy with ToolUniverse pathway databases for cross-validated, publication-ready results.

IMPORTANT: Always use English terms in tool calls (gene names, pathway names, organism names), even if the user writes in another language. Only try original-language terms as a fallback if English returns no results. Respond in the user's language.

Domain Reasoning: Background Selection

Enrichment results are only as good as your background. The default background (all annotated genes in the genome) inflates enrichment for tissue-specific or context-specific gene lists. Always consider: what is the appropriate background for this experiment? For brain RNA-seq, use brain-expressed genes as background; for a proteomics experiment, use detected proteins. A gene that is never expressed in your system cannot be a true negative control.

COMPUTE, DON'T DESCRIBE

mims-harvard1,271 Sterne29.03.2026

Beruf
Kategorien: Bioinformatik

COMPUTE, DON'T DESCRIBE

Gene Enrichment and Pathway Analysis

Domain Reasoning: Background Selection

Parameter	Required	Description	Example
gene_list	Yes	List of gene symbols, Ensembl IDs, or Entrez IDs	`["TP53", "BRCA1", "EGFR"]`
organism	No	Organism (default: human). Supported: human, mouse, rat, fly, worm, yeast, zebrafish	`human`
analysis_type	No	`ORA` (default) or `GSEA`	`ORA`
enrichment_databases	No	Which databases to query. Default: all applicable	`["GO_BP", "GO_MF", "GO_CC", "KEGG", "Reactome"]`
gene_id_type	No	Input ID type: `symbol`, `ensembl`, `entrez`, `uniprot` (auto-detected if omitted)	`symbol`
p_value_cutoff	No	Significance threshold (default: 0.05)	`0.05`
correction_method	No	Multiple testing: `BH` (Benjamini-Hochberg, default), `bonferroni`, `fdr`	`BH`
background_genes	No	Custom background gene set (default: genome-wide)	`["GENE1", "GENE2", ...]`
ranked_gene_list	No	For GSEA: gene-to-score mapping (e.g., log2FC)	`{"TP53": 2.5, "BRCA1": -1.3, ...}`

Tier	Symbol	Criteria	Examples
T1	[T1]	Curated/experimental enrichment	PANTHER, Reactome Analysis Service
T2	[T2]	Computational enrichment, well-validated	gseapy ORA/GSEA, STRING functional enrichment
T3	[T3]	Text-mining/predicted enrichment	Enrichr non-curated libraries
T4	[T4]	Single-source annotation	Individual gene GO annotations from QuickGO

Tool	Input	Output	Use For
`gseapy.enrichr()`	gene_list, gene_sets, organism	`.results` DataFrame	ORA with 225+ libraries
`gseapy.prerank()`	rnk (ranked Series), gene_sets	`.res2d` DataFrame	GSEA analysis

Tool	Key Parameters	Evidence Grade
`PANTHER_enrichment`	gene_list (comma-sep), organism, annotation_dataset	[T1]
`STRING_functional_enrichment`	protein_ids, species	[T2]
`ReactomeAnalysis_pathway_enrichment`	identifiers (space-sep), page_size	[T1]

Tool	Input	Output
`MyGene_batch_query`	gene_ids, fields	Symbol, Entrez, Ensembl mappings
`STRING_map_identifiers`	protein_ids, species	Preferred names, STRING IDs

COMPUTE, DON'T DESCRIBE

COMPUTE, DON'T DESCRIBE

Gene Enrichment and Pathway Analysis

Domain Reasoning: Background Selection

COMPUTE, DON'T DESCRIBE

COMPUTE, DON'T DESCRIBE

Gene Enrichment and Pathway Analysis

Domain Reasoning: Background Selection

When to Use This Skill

Input Parameters

Core Principles

Decision Tree: ORA vs GSEA

Decision Tree: gseapy vs ToolUniverse Tools

Quick Start Workflow

Evidence Grading

Supported Organisms

Common Patterns

Pattern 1: Standard DEG Enrichment (ORA)

Pattern 2: Ranked Gene List (GSEA)

Pattern 3: BixBench Enrichment Question

Pattern 4: Multi-Organism Enrichment

Troubleshooting

Tool Reference

Primary Enrichment Tools

Cross-Validation Tools

ID Conversion Tools

Detailed Documentation

Resources

Nanoclaw Repl

Bioinformatics

Smart Explore

Vector Database Engineer

Skin Health Analyzer

Scanpy