Name: gad-eval-new-project
Author: B2Gdevs

gad-eval-new-project

Bootstrap a brand-new GAD eval project from source design documents — create the evals/<name>/ directory, convert source docs into a canonical REQUIREMENTS.xml, wire up scoring rubric dimensions (CLI efficiency, skill trigger accuracy, planning quality), and register the project so `gad eval list` and `gad eval run <name>` pick it up. Use this skill whenever the user says they want to "add a new eval", "turn this design doc into an eval", "create an eval project from X", "scaffold escape-the-dungeon / portfolio-bare / any new eval", or mentions feeding GAMEPLAY-DESIGN / REQUIREMENTS / spec documents into the GAD eval harness. Trigger even if the user doesn't say the exact word "eval" — any request that involves taking a spec and producing a testable, traceable GAD eval project should use this skill. Do NOT use for running an already-registered eval (that's gad:eval-run) or preserving completed outputs (that's gad:eval-preserve).

B2Gdevs0 星标2026年4月13日

职业
分类: 教育

Bootstrap a new GAD evaluation project from source design documents so it is immediately runnable via the standard GAD eval harness. This skill covers the one-time scaffolding work that phase 14 of the GAD framework itself codified: turn a pile of design docs into a canonical evals/<name>/ directory that plays nicely with tracing, scoring, and preservation.

Why this skill exists

GAD evals are not ad-hoc. They have a strict shape:

A REQUIREMENTS.xml that downstream agents read as the source of truth for what to build.
A scoring rubric with the four canonical dimensions: CLI efficiency, skill trigger accuracy, planning quality, and final composite (decision gad-04 / task 14-04).
A TRACE.json contract: every run produces a machine-readable sidecar (decision gad-13), capped event outputs (gad-60), and runtime identity (gad-137).
A buildable project target — methodology-only evals are not allowed (gad-133).
A mode (greenfield | brownfield) and workflow (gad | bare | emergent) declared in gad.json (decisions gad-39, gad-40).

Users who hand-roll these files get it wrong in small ways that silently break , , and the trace-analysis pipeline. This skill gives you a precise recipe.

gad-eval-new-project

B2Gdevs0 星标2026年4月13日

职业
分类: 教育

Why this skill exists

GAD evals are not ad-hoc. They have a strict shape:

A REQUIREMENTS.xml that downstream agents read as the source of truth for what to build.

A scoring rubric with the four canonical dimensions: CLI efficiency, skill trigger accuracy, planning quality, and final composite (decision gad-04 / task 14-04).

A TRACE.json contract: every run produces a machine-readable sidecar (decision gad-13), capped event outputs (gad-60), and runtime identity (gad-137).

A buildable project target — methodology-only evals are not allowed (gad-133).

A mode (greenfield | brownfield) and workflow (gad | bare | emergent) declared in gad.json (decisions gad-39, gad-40).

Users who hand-roll these files get it wrong in small ways that silently break , , and the trace-analysis pipeline. This skill gives you a precise recipe.

gad-eval-new-project

Why this skill exists

gad-eval-new-project

Why this skill exists

Inputs you should gather

The canonical directory layout

REQUIREMENTS.xml shape

SCORING.md shape

gad.json shape

Step-by-step procedure

What this skill does NOT do

Common failure modes

When you're done

Update Skills

Eval Harness

Ecc Tools Cost Audit

Code Tour

Rules Distill

Design System