Audit another Codex skill for structural compliance, trigger quality, instruction clarity, reuse of scripts or references, and overall maintainability. Use when Codex is given a skill folder and needs to judge whether the skill is qualified, explain why it passes or fails, and summarize strengths, weaknesses, blockers, and improvement ideas across multiple dimensions.
Evaluate a target skill with a consistent rubric and return a clear pass/fail-style verdict plus a multi-dimensional review. Prefer the bundled script for the first pass, then turn the raw findings into a concise human-readable assessment.
scripts/evaluate_skill.py <path-to-skill>.SKILL.md and any referenced resources before writing the final judgment.Score the skill across these dimensions:
structure: required files, frontmatter validity, naming, obvious TODO placeholderstriggering: whether description clearly explains what the skill does and when to use itworkflow: whether the body gives actionable steps instead of vague guidanceprogressive_disclosure: whether detailed material is kept in scripts or references instead of bloating SKILL.mdresources: whether scripts, references, and assets are included only when useful and are mentioned in the bodyexamples_and_outputs: whether the skill helps the agent understand expected usage or output shapemaintainability: clarity, concision, stale metadata checks, and overall ease of iterationUse references/rubric.md when you need the detailed scoring logic and interpretation rules.
Use these labels:
Qualified: no critical blockers and score is strong enough for immediate useBorderline: usable but needs material fixes soonNot Qualified: missing required structure or too weak to trust in repeated useTreat these as critical blockers:
SKILL.mdname or descriptionTODOPrefer this response shape:
State Qualified, Borderline, or Not Qualified in the first sentence and explain the main reason.
Include the total score and 3-5 highest-signal dimension notes.
List concrete strengths tied to files or sections.
List concrete weaknesses tied to files or sections.
List the smallest set of changes most likely to move the skill to Qualified.
Run:
python3 scripts/evaluate_skill.py /absolute/path/to/skill
Optional JSON mode:
python3 scripts/evaluate_skill.py /absolute/path/to/skill --json
The script is dependency-free and performs a deterministic first-pass audit. It is intentionally conservative: if a skill barely explains its trigger conditions or still contains template leftovers, the script should flag it instead of assuming good intent.