Monte Carlo simulation study protocol
This skill enables StatsClaw to automatically design and execute Monte Carlo simulation studies that evaluate the finite-sample properties of statistical estimators. Given a DGP (Data Generating Process) specification and an estimator, it produces simulation code, runs the study, and reports results on consistency, bias, RMSE, coverage, size, and power.
Any of the following user intents activate this skill. Exact wording is NOT required — leader routes semantically:
A short prompt like "simulate the finite-sample properties of the new estimator" is sufficient.
sim-spec.md) alongside the standard spec.md and test-spec.mdTrigger: User requests implementation of a new estimator AND wants simulation evidence of its properties.
Agent Sequence:
leader → planner → [builder ∥ simulator] → tester → scriber → reviewer → shipper?
What happens:
spec.md (estimator implementation), test-spec.md (unit tests + simulation validation), sim-spec.md (DGP + scenario grid + metrics)spec.md (in worktree) — runs in parallel with simulatorsim-spec.md (in worktree) — runs in parallel with buildertest-spec.md AND executes the full simulation, comparing results against acceptance criteriaTrigger: User wants simulation study for an already-implemented estimator (no code changes needed).
Agent Sequence:
leader → planner → simulator → tester → scriber → reviewer → shipper?
What happens:
sim-spec.md (DGP + scenarios) and test-spec.md (simulation validation criteria)sim-spec.md (in worktree)test-spec.mdNo builder is dispatched since the estimator already exists.
Simulation can also be added to standard code workflows (1, 2, 4, 5) when the user's request includes simulation intent. In that case:
simulation.md artifactsim-spec.md)Planner produces sim-spec.md containing:
## Data Generating Process
### Model
Y_i = X_i'β + ε_i
### Parameters
| Parameter | Symbol | True Value(s) | Type |
| --- | --- | --- | --- |
| Coefficients | β | (1.0, 0.5, -0.3) | vector |
| Error SD | σ | 1.0 | scalar |
### Distributions
| Component | Distribution | Parameters |
| --- | --- | --- |
| X | N(0, Σ) | Σ = I_p |
| ε (baseline) | N(0, σ²) | σ = 1.0 |
| ε (heavy-tailed) | t(3) scaled to σ² | df = 3 |
| ε (skewed) | χ²(1) centered and scaled | — |
### Dimensions
| Variable | Values |
| --- | --- |
| N (sample size) | 100, 200, 500, 1000, 5000 |
| p (covariates) | 3 |
## Estimator Interface
Function: `my_estimator(Y, X, ...)`
Returns: list with components `$coefficients`, `$std_errors`, `$conf_int`, `$p_values`
## Scenario Grid
| Dimension | Values | Total Levels |
| --- | --- | --- |
| Sample size (N) | 100, 200, 500, 1000, 5000 | 5 |
| Error distribution | Normal, t(3), χ²(1) | 3 |
| Correlation (ρ) | 0.0, 0.5, 0.9 | 3 |
Total scenarios: 5 × 3 × 3 = 45
Replications per scenario: R = 2000
Total simulation runs: 90,000
## Required Metrics
- Bias (absolute and relative)
- Standard deviation of estimates
- RMSE
- Coverage of 95% CI
- Size at α = 0.05 (test H₀: β₁ = β₁_true)
- Power at α = 0.05 (test H₀: β₁ = 0 when β₁ = 0.5)
- SE ratio (estimated SE / empirical SD)
- Failure rate
## Acceptance Criteria
| Criterion | Threshold | At N ≥ |
| --- | --- | --- |
| Relative bias | < 5% | 500 |
| Coverage (95% CI) | ∈ [0.93, 0.97] | 500 |
| Size (α = 0.05) | ∈ [0.03, 0.07] | 500 |
| SE ratio | ∈ [0.95, 1.05] | 500 |
| RMSE convergence rate | slope ∈ [-0.6, -0.4] on log-log | all |
| Failure rate | < 1% | all |
## Reproducibility
Master seed: 20260326
Per-scenario seed: master_seed + (scenario_index - 1) * R + replication
RNG type: Mersenne-Twister (R) / numpy.random.default_rng (Python)
When simulation is active, the two-pipeline architecture extends to three pipelines:
planner (bridge)
/ | \
spec.md / test-spec.md \ sim-spec.md
/ | \
builder ─ ─(parallel)─ ─ simulator
(code pipeline) | (simulation pipeline)
\ | /
implementation.md | simulation.md
\ | /
\ v /
tester <-- sequential, after merge-back
(test pipeline)
|
audit.md
|
scriber (recording)
|
reviewer (convergence)
|
shipper
Key properties:
spec.mdsim-spec.mdtest-spec.md (which includes simulation validation criteria)In simulation workflows, tester has expanded responsibilities:
test-spec.mdtest-spec.mdTester's test-spec.md includes a dedicated Simulation Validation section (produced by planner) that specifies:
Tester produces the simulation results tables in audit.md, along with pass/fail assessments for each acceptance criterion.
Reviewer cross-compares THREE pipelines:
sim-spec.md?Reviewer checks:
spec.md or test-spec.md (pipeline isolation)sim-spec.md or spec.md (pipeline isolation)simulation.md and warn if it exceeds 1 hour.spec.md or test-spec.md. The simulation study is designed from mathematical principles (sim-spec.md), not from knowledge of implementation details.All of these should trigger simulation workflows:
"Implement the panel data estimator from the paper and run Monte Carlo simulations"
"Study the finite-sample properties of the new robust estimator"
"Run a simulation study: DGP is linear model with heteroskedastic errors, evaluate OLS vs GLS"
"Check if the confidence intervals have correct coverage for small samples"
"Monte Carlo: test consistency and coverage of the bootstrap estimator"
"Evaluate the new method's bias and RMSE across different sample sizes"