Name: Spatial De
Author: TianGzlab

Spatial De | Skills Pool

Format	Extension	Required Fields	Example
AnnData (preprocessed)	`.h5ad`	`adata.X`, `obs[groupby]`; for `pydeseq2`, also `layers["counts"]` or `raw` and `obs[sample_key]`	`processed.h5ad`
Demo	n/a	`--demo` flag	Runs spatial-preprocess demo, then injects synthetic sample IDs

Method	Input Matrix	Rationale
`wilcoxon`	`adata.X` (log-normalized)	`scanpy.tl.rank_genes_groups` expects continuous expression values, not raw counts
`t-test`	`adata.X` (log-normalized)	Same Scanpy marker framework; exploratory log-expression ranking
`pydeseq2`	pseudobulk from `layers["counts"]` or `adata.raw`	PyDESeq2 models raw integer-like counts with a negative-binomial GLM

adata.layers["counts"] = adata.X.copy()   # before normalize_total + log1p
adata.X = lognorm_expr                     # after normalize_total + log1p
adata.obs["leiden"] = cluster_labels
adata.obs["sample_id"] = biological_sample_ids

# Default exploratory marker discovery
oc run spatial-de \
  --input <processed.h5ad> --output <dir>

# Wilcoxon with richer Scanpy controls
oc run spatial-de \
  --input <processed.h5ad> --method wilcoxon \
  --groupby leiden --scanpy-corr-method benjamini-hochberg \
  --scanpy-pts --filter-markers \
  --min-in-group-fraction 0.25 --min-fold-change 1.0 \
  --max-out-group-fraction 0.5 --output <dir>

# Pairwise Scanpy comparison
oc run spatial-de \
  --input <processed.h5ad> --method t-test \
  --groupby leiden --group1 0 --group2 1 \
  --n-top-genes 20 --output <dir>

# Sample-aware pseudobulk DE with PyDESeq2
oc run spatial-de \
  --input <processed.h5ad> --method pydeseq2 \
  --groupby leiden --group1 0 --group2 1 \
  --sample-key sample_id \
  --min-cells-per-sample 10 --min-counts-per-gene 10 \
  --pydeseq2-fit-type parametric \
  --pydeseq2-size-factors-fit-type ratio \
  --pydeseq2-alpha 0.05 --pydeseq2-n-cpus 1 \
  --output <dir>

# Demo mode
oc run spatial-de --demo --output /tmp/de_demo

# Direct script entrypoint
python skills/spatial/spatial-de/spatial_de.py \
  --input <processed.h5ad> --method wilcoxon --output <dir>

output_dir/
├── README.md
├── report.md
├── result.json
├── processed.h5ad
├── figures/
│   ├── de_group_spatial_context.png
│   ├── de_marker_dotplot.png
│   ├── de_volcano.png
│   ├── de_effect_burden_spatial.png
│   ├── de_effect_burden_umap.png
│   ├── de_marker_heatmap.png
│   ├── de_top_hits_barplot.png
│   ├── group_de_burden.png
│   ├── de_pvalue_distribution.png
│   ├── sample_counts_by_group.png      # PyDESeq2 when sample support exists
│   ├── skipped_sample_groups.png       # PyDESeq2 when sample-group combos were skipped
│   └── manifest.json
├── figure_data/
│   ├── markers_top.csv
│   ├── top_de_hits.csv
│   ├── de_full.csv
│   ├── de_plot_points.csv
│   ├── de_significant.csv
│   ├── group_de_metrics.csv
│   ├── de_run_summary.csv
│   ├── sample_counts_by_group.csv
│   ├── skipped_sample_groups.csv
│   ├── de_spatial_points.csv
│   ├── de_umap_points.csv              # when UMAP is available
│   └── manifest.json
├── tables/
│   ├── markers_top.csv
│   ├── de_full.csv
│   ├── de_significant.csv
│   ├── group_de_metrics.csv
│   ├── sample_counts_by_group.csv
│   └── skipped_sample_groups.csv
└── reproducibility/
    ├── analysis_notebook.ipynb
    ├── commands.sh
    ├── requirements.txt
    └── r_visualization.sh

skills/spatial/spatial-de/r_visualization/
├── README.md
└── de_publication_template.R

Spatial De

Why This Exists

Core Capabilities

Spatial De

Why This Exists

Core Capabilities

Input Formats

Input Matrix Convention

Workflow

Visualization Contract

CLI Reference

Example Queries

Algorithm / Methodology

Wilcoxon

t-test

PyDESeq2

Output Structure

Safety

Dependencies

Integration with Orchestrator

Citations

Deep Research

Data Analyst

Academic Researcher

Data Scientist

Biopython

Binary Analysis Patterns