How to create, maintain and extend Python experiments. Experiment lifecycle workflows and best practices.
This skill covers the complete lifecycle of Python experiments in experiments/. Experiments are the primary way this thesis generates empirical evidence.
Source of truth: filesystem → experiments/CLAUDE.md → this skill
| Artifact | Location |
|---|---|
| Tracking | Task file in tasks/ |
| Code | experiments/<label>/ |
| SPEC.md | experiments/<label>/SPEC.md |
| Tests | experiments/<label>/test_*.py (colocated) |
| Data | data/<label>/ |
| Thesis assets | thesis/assets/<label>/ |
experiments/<label>/
├── SPEC.md # Research question, method, success criteria
├── stage_build.py # Generate data
├── test_stage_build.py # Colocated test
├── stage_analyze.py # Compute results
└── stage_plot.py # Create figures
Why stages? Separate expensive computation (build) from cheap iteration (analyze/plot). Re-run analysis without regenerating data.
experiments/_example/ demonstrates all conventions. Study it first when creating a new experiment.
| Phase | Owner | Output |
|---|---|---|
| Ideation | Jörn + PM agent | Task file in tasks/inbox/ |
| Planning | Task agent (plan mode) | SPEC.md with method and success criteria |
| Development | Task agent | stage_* code, FINDINGS.md, artifacts, PR |
| Code review | Review agent | Approved PR (or escalate back) |
| Merge | PM agent | Merged PR, task moved to done/ |
After initial development, experiments may be rerun when the codebase changes.
| Phase | Owner | Output |
|---|---|---|
| Rerun | Any agent | Updated data and asset artifacts |
| Maintenance | Any agent | Updated FINDINGS.md; escalate if needed |
# <label> Experiment
**Status:** Ideation / Specified / In Progress / Complete
## Research Question
What are we trying to learn?
## Method
How will we answer the question?
## Success Criteria
What outcome means "we are satisfied"?
## Expected Outputs
- data/<label>/...
- thesis/assets/<label>/...
# <label> Findings
## Results Summary
Key findings from the data.
## Interpretation
What the results mean for the research question.
## Escalation Procedures
When and how to flag issues.
## Changelog
- YYYY-MM-DD: Initial findings
- YYYY-MM-DD: Updated after <change>
FINDINGS.md is a living document updated on each rerun:
"""Generate data for <label> experiment."""
from pathlib import Path
# All config hardcoded for reproducibility
CONFIG = {
"experiment_name": "<label>",
"variant": "full", # Change to "smoke" for quick runs
}
DATA_DIR = Path(__file__).parent.parent.parent / "data"
OUTPUT_DIR = DATA_DIR / CONFIG["experiment_name"]
def main():
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
# Generate data
# Save to OUTPUT_DIR / "results.json" or similar
if __name__ == "__main__":
main()
"""Generate plots for <label> experiment."""
from pathlib import Path
import matplotlib.pyplot as plt
CONFIG = {"experiment_name": "<label>"}
DATA_DIR = Path(__file__).parent.parent.parent / "data"
INPUT_DIR = DATA_DIR / CONFIG["experiment_name"]
ASSETS_DIR = Path(__file__).parent.parent.parent / "thesis" / "assets" / CONFIG["experiment_name"]
def main():
ASSETS_DIR.mkdir(parents=True, exist_ok=True)
# Load data
# Create figures
plt.savefig(ASSETS_DIR / "figure.pdf")
if __name__ == "__main__":
main()
Escalate (update task file and notify Jörn) when: