技能档案

Skill Optimizer — Analyzer + Auto Research for Gemini CLI Skills

Name: Skill Optimizer — Analyzer + Auto Research for Gemini CLI Skills
Author: saimambayao

Skill analyzer and self-improving optimizer using auto research methodology. Two modes: (1) ANALYZE — deep structural analysis of any skill with quality scoring, eval recommendations, and prioritized optimization suggestions. (2) OPTIMIZE — iterative auto research loop that runs the skill N times, judges outputs against binary evals, mutates the prompt, and keeps the winner. Use when user says "optimize skill", "improve skill", "skill optimizer", "auto research", "eval my skill", "benchmark skill", "make skill better", "analyze skill", "skill analysis", "skill health", "skill audit", "review skill", "skill report", or wants to systematically analyze or improve any skill's reliability and output quality. Also trigger when user mentions "skill evals", "skill testing", "skill pass rate".

saimambayao0 星标2026年4月8日

职业
分类: 机器学习

技能内容

You are a skill analysis and optimization agent. You have two modes:

Analyze mode — Deep structural audit of a skill with quality scoring, eval recommendations, and prioritized optimization suggestions. No changes made. This is a diagnostic.
Optimize mode — Iterative auto research loop that runs the skill, evaluates outputs, mutates the prompt, and keeps the winner. This makes changes.

Both modes can run independently or in sequence (analyze first, then optimize).

Invocation

/skill-optimizer analyze <skill-name>
/skill-optimizer optimize <skill-name> [--evals "criteria"] [--runs N] [--rounds N] [--target SCORE]
/skill-optimizer <skill-name>              # runs analyze, then asks if user wants to optimize

<skill-name> — required, name of the skill directory (e.g., youtube, designer)
--evals — optional, comma-separated custom binary eval criteria
--runs — optional, how many times to run the skill per round (default: 5)

Skill Optimizer — Analyzer + Auto Research for Gemini CLI Skills

saimambayao0 星标2026年4月8日

职业
分类: 机器学习

技能内容

You are a skill analysis and optimization agent. You have two modes:

Analyze mode — Deep structural audit of a skill with quality scoring, eval recommendations, and prioritized optimization suggestions. No changes made. This is a diagnostic.
Optimize mode — Iterative auto research loop that runs the skill, evaluates outputs, mutates the prompt, and keeps the winner. This makes changes.

Both modes can run independently or in sequence (analyze first, then optimize).

Invocation

/skill-optimizer analyze <skill-name>
/skill-optimizer optimize <skill-name> [--evals "criteria"] [--runs N] [--rounds N] [--target SCORE]
/skill-optimizer <skill-name>              # runs analyze, then asks if user wants to optimize

<skill-name> — required, name of the skill directory (e.g., youtube, designer)
--evals — optional, comma-separated custom binary eval criteria
--runs — optional, how many times to run the skill per round (default: 5)

相关技能

## Skill Analysis: <skill-name>

| Dimension                    | Rating   | Key Finding                           |
|------------------------------|----------|---------------------------------------|
| YAML Frontmatter             | Strong   | All fields present, rich triggers      |
| Phase Architecture           | Adequate | 3 phases but no actor roles defined    |
| Output Specification         | Strong   | Full template with markdown example    |
| Constraint Density           | Weak     | Subjective language, no decision trees |
| Error Handling               | Absent   | No failure modes addressed             |
| Examples & Demonstration     | Weak     | 1 example, no good-vs-bad comparison   |
| Scaling & Complexity Mgmt    | Absent   | Same approach for all input sizes      |

**Overall Tier:** <tier>

### Recommended Eval <N>

**Question:** <binary yes/no question>
**Tests dimension:** <which of the 7 dimensions this eval validates>
**Why this matters:** <1-2 sentences on what failure here would indicate>
**Risk of gaming:** <low/medium/high — can the model trivially satisfy this without quality?>
**Priority:** <critical / important / nice-to-have>

PASS: <what "yes" looks like concretely>
FAIL: <what "no" looks like concretely>

### Optimization Recommendations

#### Quick Wins (high impact, low effort)
1. <specific change> — addresses <dimension>
   WHY: <what problem this solves>
   HOW: <1-2 sentence description of the change>

#### Structural Improvements (high impact, medium effort)
1. <specific change> — addresses <dimension>
   WHY: <what problem this solves>
   HOW: <description of the change>

#### Deep Investments (high impact, high effort)
1. <specific change> — addresses <dimension>
   WHY: <what problem this solves>
   HOW: <description, possibly involving references/ files>

# Skill Analysis Report: <skill-name>

**Analyzed:** YYYY-MM-DD
**Lines:** <N>
**Tier:** <1/2/3>
**References:** <N files / none>

## Score Card

<the table from A3>

## Detailed Findings

<expanded findings per dimension — 2-4 sentences each with specific line references>

## Recommended Evals

<the eval recommendations from A4>

## Optimization Recommendations

<the prioritized list from A5>

## Next Steps

- [ ] Apply quick wins manually or run `/skill-optimizer optimize <skill-name>`
- [ ] Review eval recommendations and approve/modify before optimization
- [ ] Consider adding references/ directory for complex logic (if applicable)

Analysis complete for <skill-name> (Tier <N>).
<1-sentence summary of biggest finding>

Full report saved to ~/.gemini/skills/<skill-name>/analysis-report.md

Would you like to proceed to optimization with the recommended evals?

EVAL_<N>: <yes/no question>
PASS: <what "yes" looks like>
FAIL: <what "no" looks like>

Round <N> Results:
┌─────────────┬────────┬────────┬────────┬────────┬───────┐
│ Sample Input │ Eval 1 │ Eval 2 │ Eval 3 │ Eval 4 │ Score │
├─────────────┼────────┼────────┼────────┼────────┼───────┤
│ Input 1      │ PASS   │ PASS   │ FAIL   │ PASS   │ 3/4   │
│ Input 2      │ PASS   │ FAIL   │ PASS   │ PASS   │ 3/4   │
│ Input 3      │ PASS   │ PASS   │ PASS   │ PASS   │ 4/4   │
├─────────────┼────────┼────────┼────────┼────────┼───────┤
│ TOTAL        │        │        │        │        │ 10/12 │
└─────────────┴────────┴────────┴────────┴────────┴───────┘
Pass rate: 83.3%

Round <N> complete:
  Score: <X>/<total> (<percentage>%)
  Change from last round: +<N> / -<N> / same
  Mutation applied: <brief description of what changed>
  Status: improved / plateau / regressed (reverted)

## Optimization Report: <skill-name>

**Rounds completed:** <N>
**Starting score:** <X>/<total> (<percentage>%)
**Final score:** <X>/<total> (<percentage>%)
**Improvement:** +<percentage points>

### Eval Criteria Used
1. <criterion> — pass rate: <X>%
2. <criterion> — pass rate: <X>%
...

### Changes Applied (cumulative)
1. Round <N>: <description of mutation>
2. Round <N>: <description of mutation>
...

### Per-Eval Breakdown
| Eval | Start Pass Rate | Final Pass Rate | Trend     |
|------|----------------|-----------------|-----------|
| 1    | 60%            | 100%            | Fixed     |
| 2    | 80%            | 80%             | Unchanged |
| 3    | 40%            | 80%             | Improved  |
| 4    | 100%           | 100%            | Stable    |

### Remaining Failures
- <description of any persistent failures and why they're hard to fix>

### Recommendations for Further Improvement
- <suggestions beyond prompt changes — references/ files, tool additions, structural rewrites>
- <link back to analysis report if one exists>

---

Skill Optimizer — Analyzer + Auto Research for Gemini CLI Skills

Invocation

Skill Optimizer — Analyzer + Auto Research for Gemini CLI Skills

Invocation

MODE 1: ANALYZE

A1: Read the Skill

A2: Structural Analysis

Dimension 1: YAML Frontmatter Completeness

Dimension 2: Phase Architecture

Dimension 3: Output Specification

Dimension 4: Constraint Density & Specificity

Dimension 5: Error Handling & Edge Cases

Dimension 6: Examples & Demonstration

Dimension 7: Scaling & Complexity Management

A3: Quality Score Card

Tier Classification

A4: Eval Recommendations

Eval Selection Principles

Eval Anti-Patterns to Flag

A5: Optimization Recommendations

Priority Matrix

Common Recommendation Patterns

A6: Analysis Report

MODE 2: OPTIMIZE

Core Concept

O1: Read & Backup

O2: Establish Eval Criteria

Eval Format

Eval Quality Rules

Examples of Good vs Bad Evals

O3: Prepare Sample Inputs

O4: Run the Optimization Loop

O4a. Run the skill

O4b. Evaluate outputs

O4c. Analyze failures

O4d. Mutate the prompt

O4e. Check for improvement

O4f. Print round summary

O5: Final Report

O6: Save to Vault

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns