Differential expression and marker discovery for spatial transcriptomics using Scanpy Wilcoxon / t-test or sample-aware pseudobulk PyDESeq2.
You are Spatial DE, the differential-expression and marker-discovery skill for OmicsClaw. Your role is to rank cluster markers or run explicit two-group comparisons in spatial transcriptomics data while keeping method assumptions clear.
allowed_extra_flags, output structure, and future method extensibility
across spatial DE workflows.group1 vs group2 within any groupby
column for exploratory spot-level DE.sample_key.~ sample_id + condition.filter_rank_genes_groups controls for specificity filtering.skills/spatial/_lib/viz layer.figure_data/ CSVs plus a gallery manifest
so downstream tools can restyle the same DE result without recomputing it.| Format | Extension | Required Fields | Example |
|---|---|---|---|
| AnnData (preprocessed) | .h5ad | adata.X, obs[groupby]; for pydeseq2, also layers["counts"] or raw and obs[sample_key] | processed.h5ad |
| Demo | n/a | --demo flag | Runs spatial-preprocess demo, then injects synthetic sample IDs |
Different DE methods have different statistical assumptions. OmicsClaw keeps them separate:
| Method | Input Matrix | Rationale |
|---|---|---|
wilcoxon | adata.X (log-normalized) | scanpy.tl.rank_genes_groups expects continuous expression values, not raw counts |
t-test | adata.X (log-normalized) | Same Scanpy marker framework; exploratory log-expression ranking |
pydeseq2 | pseudobulk from layers["counts"] or adata.raw | PyDESeq2 models raw integer-like counts with a negative-binomial GLM |
Core rule:
wilcoxon / t-test for exploratory marker ranking on normalized
expression.pydeseq2 only when you have a meaningful sample_key that represents
biological replicates.Recommended preprocessing layout:
adata.layers["counts"] = adata.X.copy() # before normalize_total + log1p
adata.X = lognorm_expr # after normalize_total + log1p
adata.obs["leiden"] = cluster_labels
adata.obs["sample_id"] = biological_sample_ids
If layers["counts"] is missing, OmicsClaw falls back to adata.raw or
adata.X with a warning.
groupby exists; if the default leiden is missing,
OmicsClaw can compute a minimal clustering pass.wilcoxon / t-test: exploratory Scanpy marker discoverypydeseq2: explicit two-group pseudobulk sample-aware DErank_genes_groups(...) plus optional
filter_rank_genes_groups(...)DeseqDataSet / DeseqStatsfigure_data/*.csv and
figure_data/manifest.json for downstream customization.markers_top.csv, de_full.csv, de_significant.csv,
report.md, result.json, figures, and reproducibility helpers.OmicsClaw treats spatial-de visualization as a two-layer system:
figure_data/ and does not recompute Scanpy ranking or PyDESeq2.The standard gallery is declared as a recipe instead of hard-coded if/else
plot branches. Current gallery roles include:
overview: grouping labels on tissue, top-marker dotplots, and volcano
overviews of the strongest DE comparisonsdiagnostic: group-level DE burden projected onto spatial and UMAP viewssupporting: top-hit barplots, group-burden summaries, and marker heatmapsuncertainty: adjusted p-value distributions, pseudobulk sample support, and
skipped sample-group summaries when applicable# Default exploratory marker discovery
oc run spatial-de \
--input <processed.h5ad> --output <dir>
# Wilcoxon with richer Scanpy controls
oc run spatial-de \
--input <processed.h5ad> --method wilcoxon \
--groupby leiden --scanpy-corr-method benjamini-hochberg \
--scanpy-pts --filter-markers \
--min-in-group-fraction 0.25 --min-fold-change 1.0 \
--max-out-group-fraction 0.5 --output <dir>
# Pairwise Scanpy comparison
oc run spatial-de \
--input <processed.h5ad> --method t-test \
--groupby leiden --group1 0 --group2 1 \
--n-top-genes 20 --output <dir>
# Sample-aware pseudobulk DE with PyDESeq2
oc run spatial-de \
--input <processed.h5ad> --method pydeseq2 \
--groupby leiden --group1 0 --group2 1 \
--sample-key sample_id \
--min-cells-per-sample 10 --min-counts-per-gene 10 \
--pydeseq2-fit-type parametric \
--pydeseq2-size-factors-fit-type ratio \
--pydeseq2-alpha 0.05 --pydeseq2-n-cpus 1 \
--output <dir>
# Demo mode
oc run spatial-de --demo --output /tmp/de_demo
# Direct script entrypoint
python skills/spatial/spatial-de/spatial_de.py \
--input <processed.h5ad> --method wilcoxon --output <dir>
Every successful standard OmicsClaw wrapper run, including oc run and
conversational skill execution, also writes a top-level README.md and
reproducibility/analysis_notebook.ipynb to make the output directory easier
to inspect and rerun.
adata.X (log-normalized expression)scanpy.tl.rank_genes_groups(..., method="wilcoxon")scanpy_tie_correctscanpy_corr_methodscanpy.tl.filter_rank_genes_groups(...)Core tuning flags:
--groupby--group1 / --group2 for explicit pairwise mode--scanpy-corr-method--scanpy-rankby-abs--scanpy-pts--scanpy-tie-correct--filter-markers--min-in-group-fraction--min-fold-change--max-out-group-fraction--filter-compare-absadata.X (log-normalized expression)scanpy.tl.rank_genes_groups(..., method="t-test")Use case: quick exploratory marker ranking when the user wants a fast first pass and accepts stronger parametric assumptions.
~ condition if each sample belongs to only one compared group~ sample_id + condition if the same samples contribute to both groupsDeseqDataSet(...) + DeseqStats(...)contrast=["condition", group1, group2]log2fc, pvalue_adj, stat, and significance/effect-size
helper columnsCore tuning flags:
--groupby--group1--group2--sample-key--min-cells-per-sample--min-counts-per-gene--pydeseq2-fit-type--pydeseq2-size-factors-fit-type--pydeseq2-refit-cooks--pydeseq2-alpha--pydeseq2-cooks-filter--pydeseq2-independent-filter--pydeseq2-n-cpusoutput_dir/
├── README.md
├── report.md
├── result.json
├── processed.h5ad
├── figures/
│ ├── de_group_spatial_context.png
│ ├── de_marker_dotplot.png
│ ├── de_volcano.png
│ ├── de_effect_burden_spatial.png
│ ├── de_effect_burden_umap.png
│ ├── de_marker_heatmap.png
│ ├── de_top_hits_barplot.png
│ ├── group_de_burden.png
│ ├── de_pvalue_distribution.png
│ ├── sample_counts_by_group.png # PyDESeq2 when sample support exists
│ ├── skipped_sample_groups.png # PyDESeq2 when sample-group combos were skipped
│ └── manifest.json
├── figure_data/
│ ├── markers_top.csv
│ ├── top_de_hits.csv
│ ├── de_full.csv
│ ├── de_plot_points.csv
│ ├── de_significant.csv
│ ├── group_de_metrics.csv
│ ├── de_run_summary.csv
│ ├── sample_counts_by_group.csv
│ ├── skipped_sample_groups.csv
│ ├── de_spatial_points.csv
│ ├── de_umap_points.csv # when UMAP is available
│ └── manifest.json
├── tables/
│ ├── markers_top.csv
│ ├── de_full.csv
│ ├── de_significant.csv
│ ├── group_de_metrics.csv
│ ├── sample_counts_by_group.csv
│ └── skipped_sample_groups.csv
└── reproducibility/
├── analysis_notebook.ipynb
├── commands.sh
├── requirements.txt
└── r_visualization.sh
The bundled optional R templates live under:
skills/spatial/spatial-de/r_visualization/
├── README.md
└── de_publication_template.R
pydeseq2 requires a real sample_key and does not
random-split cells into pseudo-samples.Required:
scanpyOptional (Python):
pydeseq2Optional (R):
ggplot2Trigger conditions:
Chaining:
processed.h5ad from spatial-preprocessspatial-condition