Comprehensive single-cell RNA-seq analysis and expression matrix processing using scanpy, anndata, scipy, and ToolUniverse.

LOOK UP, DON'T GUESS

When uncertain about any scientific fact, SEARCH databases first (PubMed, UniProt, ChEMBL, ClinVar, etc.) rather than reasoning from memory. A database-verified answer is always more reliable than a guess.

When to Use This Skill

Apply when users:

Have scRNA-seq data (h5ad, 10X, CSV count matrices) and want analysis
Ask about cell type identification, clustering, or annotation
Need differential expression analysis by cell type or condition
Want gene-expression correlation analysis (e.g., gene length vs expression by cell type)
Ask about PCA, UMAP, t-SNE for expression data
Need Leiden/Louvain clustering on expression matrices
Want statistical comparisons between cell types (t-test, ANOVA, fold change)
Ask about marker genes, batch correction, trajectory, or cell-cell communication

BixBench Coverage: 18+ questions across 5 projects (bix-22, bix-27, bix-31, bix-33, bix-36)

Comprehensive single-cell RNA-seq analysis and expression matrix processing using scanpy, anndata, scipy, and ToolUniverse.

LOOK UP, DON'T GUESS

When to Use This Skill

Apply when users:

Have scRNA-seq data (h5ad, 10X, CSV count matrices) and want analysis
Ask about cell type identification, clustering, or annotation
Need differential expression analysis by cell type or condition
Want gene-expression correlation analysis (e.g., gene length vs expression by cell type)
Ask about PCA, UMAP, t-SNE for expression data
Need Leiden/Louvain clustering on expression matrices
Want statistical comparisons between cell types (t-test, ANOVA, fold change)
Ask about marker genes, batch correction, trajectory, or cell-cell communication

BixBench Coverage: 18+ questions across 5 projects (bix-22, bix-27, bix-31, bix-33, bix-36)

Operation	Seurat (R)	Scanpy (Python)
Load data	`Read10X()`	`sc.read_10x_mtx()`
Normalize	`NormalizeData()`	`sc.pp.normalize_total() + sc.pp.log1p()`
Find HVGs	`FindVariableFeatures()`	`sc.pp.highly_variable_genes()`
PCA	`RunPCA()`	`sc.tl.pca()`
Cluster	`FindClusters()`	`sc.tl.leiden()`
UMAP	`RunUMAP()`	`sc.tl.umap()`
Find markers	`FindMarkers()`	`sc.tl.rank_genes_groups()`
Batch correction	`RunHarmony()`	`harmonypy.run_harmony()`

Grade	Criteria	Example
High confidence	Marker padj < 0.01, log2FC > 1, expressed in > 25% of cluster cells	CD3D as T-cell marker with padj = 1e-50, log2FC = 3.2, pct = 0.85
Moderate confidence	padj < 0.05, log2FC > 0.5, or expressed in 10-25% of cluster	FOXP3 in Treg cluster with padj = 0.001, pct = 0.18
Low confidence	padj < 0.05 but log2FC < 0.5 or low pct_diff between clusters	Ubiquitously expressed gene with marginal enrichment
Unreliable	Fewer than 20 cells in cluster, or QC metrics suggest doublets	Cluster with mean nGenes > 6000 and high doublet score

Issue	Solution
`ModuleNotFoundError: leidenalg`	`pip install leidenalg`
Sparse matrix errors	`.toarray()`: `X = adata.X.toarray() if issparse(adata.X) else adata.X`
Wrong matrix orientation	More genes than samples? Transpose
NaN in correlation	Filter: `valid = ~np.isnan(x) & ~np.isnan(y)`
Too few cells for DE	Need >= 3 cells per condition per cell type
Memory error	Use `sc.pp.highly_variable_genes()` to reduce features

Single-Cell Genomics and Expression Matrix Analysis

LOOK UP, DON'T GUESS

When to Use This Skill

Single-Cell Genomics and Expression Matrix Analysis

LOOK UP, DON'T GUESS

When to Use This Skill

Core Principles

Required Packages

Workflow Decision Tree

Data Loading

Quality Control

Complete Pipeline (Quick Reference)

Differential Expression Decision Tree

Statistical Tests (Quick Reference)

Batch Correction (Harmony)

ToolUniverse Integration

Data Discovery (before analysis)

Cell Type Markers

Gene Annotation

Cell-Cell Communication

Enrichment (Post-DE)

Clinical Context (for tumor immunology)

Scanpy vs Seurat Equivalents

Reasoning Framework for Result Interpretation

Evidence Grading

Interpretation Guidance

Synthesis Questions

Troubleshooting

Reference Documentation

Nanoclaw Repl

Bioinformatics

Smart Explore

Vector Database Engineer

Skin Health Analyzer

Scanpy