Validates processed AnnData objects have all required fields, layers, embeddings, and DE results for downstream analysis. Use when a user says "validate this h5ad", "check if preprocessing is complete", or automatically after dataset-preprocessing-workflow finishes.
Validates that a processed AnnData object has all required fields, layers, and metadata for downstream analysis. Acts as a quality gate after dataset-preprocessing-workflow.
dataset-preprocessing-workflowdownstream-agent-skills-generator to ensure data integrityRun the validation script:
python .claude/skills/result-schema-validator/scripts/validate_schema.py --input <processed.h5ad>
The script checks all required fields and returns a structured pass/fail report.
n_genes or n_genes_by_counts — genes detected per celltotal_counts — total UMI counts per cellpct_counts_mt — mitochondrial percentageleiden — cluster assignmentshighly_variable — HVG flag (in raw var if subset was applied)mt — mitochondrial gene flagcounts — raw count matrixX_pca — PCA embeddingX_umap — UMAP embeddingpreprocessing — dict with pipeline parametersde_performed — boolean flagde_performed is True:
rank_genes_groups must exist in unsnames, pvals_adj, logfoldchanges fieldsde_method, de_groupby, de_reference must be recorded{
"status": "pass | fail",
"file": "<path to validated file>",
"checks": {
"obs_columns": {"status": "pass", "found": ["n_genes", "total_counts", "pct_counts_mt", "leiden"], "missing": []},
"var_columns": {"status": "pass", "found": ["highly_variable", "mt"], "missing": []},
"layers": {"status": "pass", "found": ["counts"], "missing": []},
"embeddings": {"status": "pass", "found": ["X_pca", "X_umap"], "missing": []},
"metadata": {"status": "pass", "found": ["preprocessing", "de_performed"], "missing": []},
"de_results": {"status": "pass | skipped", "details": "<details>"}
},
"summary": {
"total_checks": 6,
"passed": 6,
"failed": 0,
"cells": 5000,
"genes": 3000
}
}
dataset-preprocessing-workflow (post-processing validation)scripts/validate_schema.py