Name: RNA-seq Differential Expression Analysis (DESeq2)
Author: FreedomIntelligence

RNA-seq Differential Expression Analysis (DESeq2)

Production-ready RNA-seq differential expression analysis using PyDESeq2. Performs DESeq2 normalization, dispersion estimation, Wald testing, LFC shrinkage, and result filtering. Handles multi-factor designs, multiple contrasts, batch effects, and integrates with gene enrichment (gseapy) and ToolUniverse annotation tools (UniProt, Ensembl, OpenTargets). Supports CSV/TSV/H5AD input formats and any organism. Use when analyzing RNA-seq count matrices, identifying DEGs, performing differential expression with statistical rigor, or answering questions about gene expression changes.

FreedomIntelligence2,097 starsMar 8, 2026

Occupation
Categories: Bioinformatics

Comprehensive differential expression analysis of RNA-seq count data using PyDESeq2, with integrated enrichment analysis (gseapy) and gene annotation via ToolUniverse.

BixBench Coverage: Validated on 53 BixBench questions across 15 computational biology projects covering RNA-seq, miRNA-seq, and differential expression analysis tasks.

Core Principles

Data-first approach - Load and validate count data and metadata BEFORE any analysis
Statistical rigor - Always use proper normalization, dispersion estimation, and multiple testing correction
Flexible design - Support single-factor, multi-factor, and interaction designs
Threshold awareness - Apply user-specified thresholds exactly (padj, log2FC, baseMean)
Reproducible - Set random seeds, document all parameters, output complete results
Question-driven - Parse what the user is actually asking and extract the specific answer
Enrichment integration - Chain DESeq2 results into pathway/GO enrichment when requested
- Use English gene/pathway names in all tool calls

RNA-seq Differential Expression Analysis (DESeq2)

FreedomIntelligence2,097 starsMar 8, 2026

Occupation
Categories: Bioinformatics

Core Principles

Data-first approach - Load and validate count data and metadata BEFORE any analysis

Statistical rigor - Always use proper normalization, dispersion estimation, and multiple testing correction

Flexible design - Support single-factor, multi-factor, and interaction designs

Threshold awareness - Apply user-specified thresholds exactly (padj, log2FC, baseMean)

Reproducible - Set random seeds, document all parameters, output complete results

Question-driven - Parse what the user is actually asking and extract the specific answer

Enrichment integration - Chain DESeq2 results into pathway/GO enrichment when requested

- Use English gene/pathway names in all tool calls

Error	Solution
"No matching samples"	Check if counts need transposing; strip whitespace
"Dispersion trend did not converge"	Use `fit_type='mean'`
"Contrast not found"	Check `metadata['factor'].unique()` for exact names
"Non-integer counts"	Round to integers OR use t-test for normalized data
"NaN in padj"	Independent filtering removed genes; exclude from counts

RNA-seq Differential Expression Analysis (DESeq2)

Core Principles

RNA-seq Differential Expression Analysis (DESeq2)

Core Principles

When to Use This Skill

Required Packages

Analysis Workflow

Step 1: Question Parsing

Step 1.5: Design Formula Decision Tree ⚠️ CRITICAL

Step 2: Data Loading & Validation

Step 2.5: Inspect Metadata Structure ⚠️ REQUIRED

Step 3: Run PyDESeq2

Multi-Factor Design (Common Real-World Case)

Step 4: Filter Results

Step 5: Dispersion Analysis (if asked)

Step 6: Enrichment Analysis (optional)

Step 7: Gene Annotation with ToolUniverse (optional)

Output Formatting

Common BixBench Patterns

Pattern 1: Basic DEG Count

Pattern 2: Specific Gene Value

Pattern 3: Direction-Specific

Pattern 4: Set Operations

Pattern 5: Dispersion Count

Error Handling

Validation Checklist

Known Limitations

PyDESeq2 vs R DESeq2 Differences

GO/KEGG Enrichment (gseapy vs clusterProfiler)

References

Utility Scripts

Nanoclaw Repl

Bioinformatics

Smart Explore

Vector Database Engineer

Skin Health Analyzer

Scanpy