Name: Purpose
Author: merceralex397-collab

Purpose

Evaluate whether a skill routes correctly and produces better output than a baseline — measuring precision, recall, output quality, and win rate against no-skill runs. Use this when validating a new skill before promotion, verifying a refinement worked, or auditing whether an existing skill still adds value. Do not use for building the test harness itself (use skill-testing-harness), benchmarking multiple variants head-to-head (use skill-benchmarking), or debugging a skill that is obviously broken (fix it first with skill-refinement).

merceralex397-collab0 starsMar 17, 2026

Occupation
Categories: Machine Learning

Measures whether a skill is working: does it route correctly (trigger when should, not trigger when shouldn't) and does it produce better output than not having the skill? Produces quantitative evidence that a skill adds value.

When to use this skill

Use when:

User says "is this skill working?", "evaluate this skill", "does this help?"
New skill needs validation before promotion to stable
Skill was refined and needs verification fix worked
Comparing skill variants to decide which to keep
Periodic audit of skill library quality

Do NOT use when:

Building the test harness/suite (use skill-testing-harness)
Running existing automated evals (use skill-eval-runner)
Benchmarking performance metrics (use skill-benchmarking)
Skill is obviously broken—fix it first (use skill-refinement)

Operating procedure

Define what "working" means for this skill:

When to use this skill

Use when:

User says "is this skill working?", "evaluate this skill", "does this help?"
New skill needs validation before promotion to stable
Skill was refined and needs verification fix worked
Comparing skill variants to decide which to keep
Periodic audit of skill library quality

Do NOT use when:

Building the test harness/suite (use skill-testing-harness)
Running existing automated evals (use skill-eval-runner)
Benchmarking performance metrics (use skill-benchmarking)
Skill is obviously broken—fix it first (use skill-refinement)

Operating procedure

Define what "working" means for this skill:

Purpose

When to use this skill

Operating procedure

Purpose

When to use this skill

Operating procedure

Output defaults

References

Failure handling

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns