Bootstrap a brand-new GAD eval project from source design documents — create the evals/<name>/ directory, convert source docs into a canonical REQUIREMENTS.xml, wire up scoring rubric dimensions (CLI efficiency, skill trigger accuracy, planning quality), and register the project so `gad eval list` and `gad eval run <name>` pick it up. Use this skill whenever the user says they want to "add a new eval", "turn this design doc into an eval", "create an eval project from X", "scaffold escape-the-dungeon / portfolio-bare / any new eval", or mentions feeding GAMEPLAY-DESIGN / REQUIREMENTS / spec documents into the GAD eval harness. Trigger even if the user doesn't say the exact word "eval" — any request that involves taking a spec and producing a testable, traceable GAD eval project should use this skill. Do NOT use for running an already-registered eval (that's gad:eval-run) or preserving completed outputs (that's gad:eval-preserve).
Bootstrap a new GAD evaluation project from source design documents so it is immediately runnable via the standard GAD eval harness. This skill covers the one-time scaffolding work that phase 14 of the GAD framework itself codified: turn a pile of design docs into a canonical evals/<name>/ directory that plays nicely with tracing, scoring, and preservation.
GAD evals are not ad-hoc. They have a strict shape:
REQUIREMENTS.xml that downstream agents read as the source of truth for what to build.TRACE.json contract: every run produces a machine-readable sidecar (decision gad-13), capped event outputs (gad-60), and runtime identity (gad-137).greenfield | brownfield) and workflow (gad | bare | emergent) declared in gad.json (decisions gad-39, gad-40).Users who hand-roll these files get it wrong in small ways that silently break , , and the trace-analysis pipeline. This skill gives you a precise recipe.
gad eval rungad eval preserveBefore writing anything, make sure you know:
escape-the-dungeon, portfolio-bare.gad (full framework), bare (no framework), or emergent (inherits skills). Default to gad.If the user is vague, pick reasonable defaults and document them in a NOTES.md in the eval directory rather than blocking on a question.
evals/<project-name>/
├── REQUIREMENTS.xml # Single source of truth for what the agent must build
├── SCORING.md # The rubric (four dimensions + composite weighting)
├── gad.json # { "mode": "...", "workflow": "...", "buildable": true }
├── NOTES.md # (optional) assumptions, source doc lineage
└── runs/ # Populated later by gad eval run — do not create now
Do not create a runs/ directory during bootstrap. It is created by gad eval run and must stay empty until the first run, so gad eval verify can distinguish "bootstrapped but never run" from "run but not preserved."
The requirements file is the contract the eval agent reads. Keep it machine-parseable and keep narrative out of the body — narrative belongs in the source design doc, which you can reference.
<Requirements project="escape-the-dungeon" version="1">
<Intent>One-paragraph statement of what a successful build looks like.</Intent>
<SourceDocs>
<Doc path="design/GAMEPLAY-DESIGN.xml" role="primary" />
<Doc path="design/STAT-AND-BEHAVIOUR-TAXONOMY.md" role="taxonomy" />
</SourceDocs>
<Deliverables>
<Deliverable id="D1">Playable game build served from public/</Deliverable>
<Deliverable id="D2">At least one boss encounter implementing the taxonomy</Deliverable>
</Deliverables>
<Constraints>
<Constraint>All planning artifacts live under game/.planning/</Constraint>
<Constraint>Source code under src/, assets under public/</Constraint>
</Constraints>
<Acceptance>
<Check>`npm run build` exits 0</Check>
<Check>TRACE.json present in run directory</Check>
</Acceptance>
</Requirements>
Derive each <Deliverable> and <Check> from the source docs — do not invent. If the source doc is silent on acceptance, put one <Check> for a successful build and note the gap in NOTES.md.
The four canonical dimensions come from task 14-04. Use them verbatim so cross-eval comparison stays meaningful — this is exactly the reason GAD chose a native harness over skill-creator's generic dimensions (decision gad-87).
# Scoring rubric — <project-name>
## Dimensions
1. CLI efficiency — tokens + tool calls per task completed.
2. Skill trigger accuracy — correct skills triggered at the right step.
3. Planning quality — task coverage, STATE.xml freshness, DECISIONS.xml hygiene.
4. Composite — weighted combination (default 0.25 each).
## Project-specific checks
- [Listed checks derived from REQUIREMENTS.xml Acceptance block]
Do not invent new dimensions. If the project genuinely needs something custom, add it as a project-specific check under dimension 4, not as a fifth dimension.
{
"mode": "greenfield",
"workflow": "gad",
"buildable": true,
"buildCommand": "npm run build",
"serveCommand": "npm run preview"
}
mode and workflow feed gad eval list. buildable: true is mandatory per gad-133. buildCommand must be real — if you don't know, use echo no-build and flag it in NOTES.md rather than hallucinating.
evals/<name>/ in the target repo.gad.json first. This is how downstream tools even know the project exists.REQUIREMENTS.xml referencing the source docs by path.SCORING.md with the four canonical dimensions plus project-specific checks pulled from the Acceptance block of REQUIREMENTS.xml.NOTES.md recording them.gad eval list — the new project should appear. If it does not, the most common cause is a malformed gad.json.gad:eval-run.gad:eval-run / gad:eval-bootstrap.gad eval preserve (mandatory per gad-38, not optional).gad eval report or gad:self-eval.Stay in your lane. Handing a user a pre-populated runs/ directory or a hand-written TRACE.json will corrupt the eval pipeline's assumptions.
package.json if present. If there is no buildable target at all, stop and tell the user the project fails gad-133.runs/. Don't. It breaks gad eval verify.Report back with:
gad eval list line confirming registration.Then stop. The next step belongs to gad:eval-run when the user is ready to actually execute.