Skill ファイル

Evaluate

Name: Evaluate
Author: Q00

Evaluate execution with three-stage verification pipeline

Q002,421 スター2026/03/12

職業: ソフトウェア品質保証アナリスト・テスター
カテゴリ: CI/CD

スキル内容

/ouroboros:evaluate

Evaluate an execution session using the three-stage verification pipeline.

Usage

/ouroboros:evaluate <session_id> [artifact]

Trigger keywords: "evaluate this", "3-stage check"

How It Works

The evaluation pipeline runs three progressive stages:

Stage 1: Mechanical Verification ($0 cost)
- Lint checks, build validation, test execution
- Static analysis, coverage measurement
- Fails fast if mechanical checks don't pass
Stage 2: Semantic Evaluation (Standard tier)
- AC compliance assessment
- Goal alignment scoring
- Drift measurement
- Reasoning explanation

関連 Skill

Evaluate | Skills Pool

Use the ToolSearch tool to find and load the evaluate MCP tool:
```
ToolSearch query: "+ouroboros evaluate"
```
The tool will typically be named mcp__plugin_ouroboros_ouroboros__ouroboros_evaluate (with a plugin prefix). After ToolSearch returns, the tool becomes callable.
If ToolSearch finds the tool → proceed with the MCP-based evaluation below. If not → skip to Fallback section.

Determine what to evaluate:
- If session_id provided: Use it directly
- If no session_id: Check conversation for recent execution session IDs
Gather the artifact to evaluate:
- If user specifies a file: Read it with Read tool
- If recent execution output exists in conversation: Use that
- Ask user if unclear what to evaluate

Call the ouroboros_evaluate MCP tool:

Tool: ouroboros_evaluate
Arguments:
  session_id: <session ID>
  artifact: <the code/output to evaluate>
  seed_content: <original seed YAML, if available>
  acceptance_criterion: <specific AC to check, optional>
  artifact_type: "code"  (or "docs", "config")
  trigger_consensus: false  (true if user requests Stage 3)

Present results clearly:
- Show each stage's pass/fail status
- Highlight the final approval decision
- If rejected, explain the failure reason
- Suggest fixes if evaluation fails
- Always end with a 📍 suggestion based on the outcome:
  - APPROVED: 📍 Done! Your implementation passes all checks. Optional: ooo evolve to iteratively refine
  - REJECTED at Stage 1 (mechanical, code_changes_detected: true): 📍 Next: Fix the build/test failures above, then ooo evaluate — or ooo ralph for automated fix loop
  - REJECTED at Stage 1 (mechanical, code_changes_detected: false): 📍 Next: Run ooo run first to produce code, then ooo evaluate
  - REJECTED at Stage 2 (semantic): 📍 Next: ooo run to re-execute with fixes — or ooo evolve for iterative refinement
  - REJECTED at Stage 3 (consensus): 📍 Next: ooo interview to re-examine requirements — or ooo unstuck to challenge assumptions

User: /ouroboros:evaluate sess-abc-123

Evaluation Results
============================================================
Final Approval: APPROVED
Highest Stage Completed: 2

Stage 1: Mechanical Verification
  [PASS] lint: No issues found
  [PASS] build: Build successful
  [PASS] test: 12/12 tests passing

Stage 2: Semantic Evaluation
  Score: 0.85
  AC Compliance: YES
  Goal Alignment: 0.90
  Drift Score: 0.08

📍 Done! Your implementation passes all checks. Optional: `ooo evolve` to iteratively refine

Evaluate

/ouroboros:evaluate

Usage

How It Works

Evaluate

/ouroboros:evaluate

Usage

How It Works

Instructions

Load MCP Tools (Required first)

Evaluation Steps

Fallback (No MCP Server)

Example

Github

Openclaw Parallels Smoke

Update Screenshots

Azure Pipelines

Deployment Patterns

Deployment Patterns