Skip to content

スキルを検索.../

Agent Skill Search Engine

検索

検索
カテゴリ
職業

About

About
Privacy
Terms

© 2026 Skills Pool. All rights reserved.

Agent Evaluation | Skills Pool

Skill ファイル

Agent Evaluation

Evaluate agent performance using a structured scoring rubric

cdalsoniii0 スター2026/02/13

職業: ソフトウェア品質保証アナリスト・テスター
カテゴリ: LLM・AI

スキル内容

Agent Evaluation Skill

Evaluate agent performance using a structured scoring rubric.

Trigger Conditions

Agent configuration change
Evaluation cadence (monthly)
User invokes with "evaluate agents" or "agent scorecard"

Input Contract

Required: Agent(s) to evaluate
Required: Evaluation criteria or rubric
Optional: Baseline scores from prior evaluation

Output Contract

Evaluation scorecard (500-point rubric)
Per-dimension scores and findings
Improvement recommendations
Comparison against baseline

Tool Permissions

Agent configs, agent output logs, telemetry

関連 Skill

クイックインストール

Agent Evaluation

npx skillvault add cdalsoniii/cdalsoniii-brightpath-coder-cursor-skills-agent-evaluation-skill-md

Skill をダウンロードリポジトリを開く

作者: cdalsoniii
スター: 0
更新日: 2026/02/13
職業

このページの内容

01Agent Evaluation Skill

Read:

Write: Evaluation reports

Search: Agent invocation history

Execution Steps

Load evaluation rubric (architecture 100, security 100, ops 100, testing 100, docs 100)
For each agent, review recent outputs and effectiveness
Score each dimension with evidence
Compare against baseline scores
Identify improvement opportunities
Generate scorecard and recommendations

Success Criteria

All dimensions scored with evidence
Comparison against prior evaluation
Top 3 improvement recommendations per agent
Overall portfolio health assessment

Escalation Rules

Escalate if any agent scores below 50% on any dimension
Escalate if agent evaluation reveals conflicting outputs between agents

Example Invocations

Input: "Evaluate the security-specialist agent effectiveness"

Output: Scorecard: Security (85/100), Architecture (70/100), Ops (75/100), Testing (60/100), Docs (80/100). Total: 370/500. Findings: strong CVE detection but weak test coverage recommendations, documentation quality high but missing escalation follow-through. Top improvement: integrate with testing-specialist for security test gap analysis.

02

Trigger Conditions

03Input Contract

04Output Contract

05Tool Permissions

06Execution Steps

07Success Criteria

08Escalation Rules

09Example Invocations

Openai Whisper

Local speech-to-text with the Whisper CLI (no API key).

Voice Call

Start voice calls via the OpenClaw voice-call plugin.

CMS・プラットフォーム

Prose

OpenProse VM skill pack. Activate on any `prose` command, .prose files, or OpenProse mentions; orchestrates multi-agent workflows.

Clawhub

Use the ClawHub CLI to search, install, update, and publish agent skills from clawhub.com. Use when you need to fetch new skills on the fly, sync installed skills to latest or a specific version, or publish new/updated skill folders with the npm-installed clawhub CLI.

Sherpa Onnx Tts

Local text-to-speech via sherpa-onnx (offline, no cloud)

Openai Whisper Api

Transcribe audio via OpenAI Audio Transcriptions API (Whisper).

ソフトウェア品質保証アナリスト・テスター