Name: scaffold-traced-eval-project
Author: B2Gdevs

scaffold-traced-eval-project

Scaffold a new GAD eval project end-to-end — converts loose source docs (design docs, requirements, domain notes) into a complete evals/<name>/ directory with REQUIREMENTS.xml, a registered TRACE.json schema fragment, a GAD-native SCORING.md rubric, a gad.json with mode/workflow fields, and a starter template/ folder, then prints the exact `gad eval run <name>` invocation. Use this skill whenever the user says "set up a new eval project", "turn this design doc into an eval", "make this testable as an eval", "we should add a benchmark for <domain>", or otherwise wants to go from raw source material to a runnable GAD eval. Always trigger this even if the user only mentions the design doc and eval in the same sentence — the scaffolding has a very specific recurring shape (decisions gad-13, gad-38, gad-39, gad-87, gad-88) that is easy to get wrong by hand. Do NOT trigger for running an existing eval — that is `gad eval run` / `gad-eval-run` skill territory.

B2Gdevs0 Sterne13.04.2026

Beruf
Kategorien: Automatisierungswerkzeuge

Why this skill exists

Every new GAD eval project needs the same four artifacts assembled in the same order, and each one has a non-obvious constraint that bit us in phase 14:

REQUIREMENTS.xml in GAD's XML shape (not freeform markdown) so gad eval list and the preservation pipeline can parse it.
A trace schema fragment registered in lib/trace-schema.cjs. Earlier attempts used freeform JSON or inline schemas and broke the moment a second eval needed different fields (three failed attempts on task 14-03). The fragment-registration pattern is the one that survived.
SCORING.md that uses GAD-native dimensions — CLI efficiency, skill trigger accuracy, planning quality — not Anthropic's generic ones (decision gad-87). Programmatic, trace-derived metrics first; human review second (decision gad-69).
gad.json with both mode (greenfield|brownfield) and workflow (gad|bare|emergent) populated, plus experimental: true for new domains (decision gad-88). Missing either field breaks gad eval list (decision gad-39).

Doing this by hand is a session's worth of work and has repeatedly produced artifacts that silently break downstream tooling. This skill is the recipe that keeps landing on the same shape.

scaffold-traced-eval-project

B2Gdevs0 Sterne13.04.2026

Beruf
Kategorien: Automatisierungswerkzeuge

Why this skill exists

Every new GAD eval project needs the same four artifacts assembled in the same order, and each one has a non-obvious constraint that bit us in phase 14:

REQUIREMENTS.xml in GAD's XML shape (not freeform markdown) so gad eval list and the preservation pipeline can parse it.

A trace schema fragment registered in lib/trace-schema.cjs. Earlier attempts used freeform JSON or inline schemas and broke the moment a second eval needed different fields (three failed attempts on task 14-03). The fragment-registration pattern is the one that survived.

SCORING.md that uses GAD-native dimensions — CLI efficiency, skill trigger accuracy, planning quality — not Anthropic's generic ones (decision gad-87). Programmatic, trace-derived metrics first; human review second (decision gad-69).

gad.json with both mode (greenfield|brownfield) and workflow (gad|bare|emergent) populated, plus experimental: true for new domains (decision gad-88). Missing either field breaks gad eval list (decision gad-39).

Doing this by hand is a session's worth of work and has repeatedly produced artifacts that silently break downstream tooling. This skill is the recipe that keeps landing on the same shape.

scaffold-traced-eval-project

Why this skill exists

scaffold-traced-eval-project

Why this skill exists

When to use this skill

Coding Agent (bash-first)

Fix

Commit

Init

Github Copilot Upgrader

Rebuilding Flutter Tool