Compare experimental conditions in spatial transcriptomics data using pseudobulk differential expression with method-aware PyDESeq2 or Wilcoxon testing and explicit replicate handling.
You are Spatial Condition, the condition-comparison skill for OmicsClaw. Your role is to compare treatment groups, disease states, or experimental conditions in spatial transcriptomics data using replicate-aware pseudobulk statistics rather than spot-level pseudoreplication.
pydeseq2) or an explicit
Wilcoxon fallback.skills/spatial/_lib/viz layer.figure_data/ CSVs plus a gallery manifest
so downstream tools can restyle the same analysis without recomputing DE.| Format | Extension | Required Fields | Example |
|---|---|---|---|
| AnnData (preprocessed) | .h5ad | layers["counts"] or raw, obs[condition_key], obs[sample_key] | multi_sample.h5ad |
| Demo | n/a | --demo flag | Runs spatial-preprocess demo then injects synthetic conditions / replicate IDs |
This skill has a multi-step pipeline with different statistical assumptions:
| Component | Input Matrix | Rationale |
|---|---|---|
| Pseudobulk aggregation | adata.layers["counts"] (raw) | Sum aggregation requires raw integer-like counts; summing log values is invalid |
| PyDESeq2 | pseudobulk raw integer counts | PyDESeq2 models counts with a negative-binomial GLM |
| Wilcoxon | pseudobulk raw counts internally transformed to CPM/log-space | OmicsClaw computes the transformation inside the wrapper before rank testing |
Core principle: pseudobulk always starts from raw counts, then the DE method operates on the aggregated matrix.
Data layout requirement:
adata.layers["counts"] = adata.X.copy() # before normalize_total + log1p
adata.X = lognorm_expr # after normalize_total + log1p
adata.obs["condition"] = condition_labels # e.g. "treated" / "control"
adata.obs["sample_id"] = sample_labels # biological replicate IDs
If layers["counts"] is missing, OmicsClaw falls back to adata.raw or
adata.X with a warning.
pydeseq2: fit NB GLM per cluster/contrastwilcoxon: run rank-sum test on transformed pseudobulk profilesfigure_data/*.csv and
figure_data/manifest.json for downstream customization.report.md, result.json, processed.h5ad,
figures, tables, and reproducibility helpers.OmicsClaw treats spatial-condition visualization as a two-layer system:
figure_data/ and does not recompute pseudobulk DE.The standard gallery is declared as a recipe instead of hard-coded if/else
plot branches. Current gallery roles include:
overview: condition labels on tissue and a global pseudobulk volcanodiagnostic: cluster-level DE burden projected onto spatial and UMAP viewssupporting: per-contrast significant-gene counts and per-cluster burdenuncertainty: adjusted p-value distributions, sample support, and
skipped-contrast summaries when applicable# PyDESeq2 default run
oc run spatial-condition \
--input <data.h5ad> --output <dir> \
--condition-key condition --sample-key sample_id
# PyDESeq2 with explicit cluster labels and tuning
oc run spatial-condition \
--input <data.h5ad> --method pydeseq2 \
--condition-key condition --sample-key sample_id --cluster-key leiden \
--reference-condition control \
--min-counts-per-gene 10 --min-samples-per-condition 2 \
--pydeseq2-fit-type parametric \
--pydeseq2-size-factors-fit-type ratio \
--pydeseq2-alpha 0.05 --pydeseq2-n-cpus 1 \
--output <dir>
# Wilcoxon fallback mode
oc run spatial-condition \
--input <data.h5ad> --method wilcoxon \
--condition-key condition --sample-key sample_id \
--reference-condition control \
--wilcoxon-alternative two-sided \
--output <dir>
# Demo mode
oc run spatial-condition --demo --output /tmp/cond_demo
# Direct script entrypoint
python skills/spatial/spatial-condition/spatial_condition.py \
--input <data.h5ad> --method pydeseq2 --output <dir>
Every successful standard OmicsClaw wrapper run, including oc run and
conversational skill execution, also writes a top-level README.md and
reproducibility/analysis_notebook.ipynb to make the output directory easier
to inspect and rerun.
~ condition using official DeseqDataSet(...)
controlsDeseqStats(...) with explicit
contrast=["condition", other, reference]base_mean, log2fc, stat, pvalue, and adjusted
pvalue_adjCore tuning flags:
--reference-condition: defines the sign direction of the contrast--cluster-key: chooses the cluster partition for per-cluster pseudobulk--min-counts-per-gene: wrapper-level gene filter before DE testing--min-samples-per-condition: wrapper-level replicate gate--pydeseq2-fit-type: official PyDESeq2 dispersion fit mode--pydeseq2-size-factors-fit-type: official normalization size-factor mode--pydeseq2-refit-cooks: whether Cook's outlier refitting is enabled--pydeseq2-alpha: official target FDR used by DeseqStats--pydeseq2-cooks-filter: whether Cook's filtering is applied in result
extraction--pydeseq2-independent-filter: whether independent filtering is applied in
result extraction--pydeseq2-n-cpus: CPU count passed to PyDESeq2scipy.stats.ranksums() is used on per-gene transformed valueslog2fc, Wilcoxon statistic, raw
pvalue, and BH-adjusted pvalue_adjCore tuning flags:
--reference-condition: defines the sign direction of the contrast--cluster-key: chooses the cluster partition for per-cluster pseudobulk--min-counts-per-gene: wrapper-level gene filter before DE testing--min-samples-per-condition: minimum replicate count before a contrast is
attempted--wilcoxon-alternative: official SciPy alternative hypothesisoutput_directory/
├── README.md
├── report.md
├── result.json
├── processed.h5ad
├── figures/
│ ├── condition_spatial_context.png
│ ├── pseudobulk_volcano.png
│ ├── condition_effect_burden_spatial.png
│ ├── condition_effect_burden_umap.png
│ ├── condition_de_barplot.png
│ ├── cluster_de_burden.png
│ ├── condition_pvalue_distribution.png
│ ├── sample_counts_by_condition.png
│ └── manifest.json
├── figure_data/
│ ├── pseudobulk_de.csv
│ ├── pseudobulk_volcano_points.csv
│ ├── per_cluster_summary.csv
│ ├── skipped_contrasts.csv
│ ├── cluster_de_metrics.csv
│ ├── top_de_genes.csv
│ ├── sample_counts_by_condition.csv
│ ├── condition_run_summary.csv
│ ├── condition_spatial_points.csv
│ ├── condition_umap_points.csv # when UMAP is available
│ └── manifest.json
├── tables/
│ ├── pseudobulk_de.csv
│ ├── per_cluster_summary.csv
│ └── skipped_contrasts.csv
└── reproducibility/
├── analysis_notebook.ipynb
├── commands.sh
├── requirements.txt
└── r_visualization.sh
README.md and reproducibility/analysis_notebook.ipynb are generated by the
standard OmicsClaw wrapper. Direct script execution usually produces the
skill-native outputs plus reproducibility/commands.sh.
The bundled optional R templates live under:
skills/spatial/spatial-condition/r_visualization/
├── README.md
└── condition_publication_template.R
Required:
scanpyscipyOptional:
pydeseq2 for proper negative-binomial GLM inferenceOptional (R):
ggplot2Trigger conditions:
Chaining partners:
spatial-preprocess for counts preservation and baseline clusteringspatial-enrichment for pathway analysis on DE genesspatial-genes for comparing treatment-responsive genes to SVG programs