스킬 파일

Meta Optimize

Name: Meta Optimize
Author: wanshuiyin

Analyze ARIS usage logs and propose optimizations to SKILL.md files, reviewer prompts, and workflow defaults. Outer-loop harness optimization inspired by Meta-Harness (Lee et al., 2026). Use when user says "优化技能", "meta optimize", "improve skills", "分析使用记录", or wants to optimize ARIS's own harness components based on accumulated experience.

wanshuiyin6,975 스타2026. 4. 15.

직업
카테고리: 프로젝트 관리

스킬 내용

Meta-Optimize: Outer-Loop Harness Optimization for ARIS

Analyze accumulated usage logs and propose optimizations for: $ARGUMENTS

Context

ARIS is a research harness — a system of skills, bridges, workflows, and artifact contracts that wraps around LLMs to orchestrate research. This skill implements a prototype outer loop that observes how the harness is used and proposes improvements to the harness itself (not to the research artifacts it produces).

Inspired by Meta-Harness (Lee et al., 2026): the key insight is that harness design matters as much as model weights, and harness engineering can be partially automated by logging execution traces and using them to guide improvements.

What This Skill Optimizes (Harness Components)

Component	Example	Optimizable?
SKILL.md prompts

관련 스킬

Meta Optimize | Skills Pool

EVENTS_FILE=".aris/meta/events.jsonl"
if [ ! -f "$EVENTS_FILE" ]; then
    echo "ERROR: No event log found at $EVENTS_FILE"
    echo "Enable logging first: copy templates/claude-hooks/meta_logging.json into .claude/settings.json"
    exit 1
fi

EVENT_COUNT=$(wc -l < "$EVENTS_FILE")
SKILL_INVOCATIONS=$(grep -c '"skill_invoke"' "$EVENTS_FILE" || echo 0)
SESSIONS=$(grep -c '"session_start"' "$EVENTS_FILE" || echo 0)

echo "📊 Event log: $EVENT_COUNT events, $SKILL_INVOCATIONS skill invocations, $SESSIONS sessions"

if [ "$SKILL_INVOCATIONS" -lt 5 ]; then
    echo "⚠️  Insufficient data (<5 skill invocations). Continue using ARIS normally and re-run later."
    exit 0
fi

## Optimization Opportunities (ranked)

| # | Target | Signal | Proposed Change | Expected Impact |
|---|--------|--------|-----------------|-----------------|
| 1 | auto-review-loop default threshold | Users override to 7/10 in 60% of runs | Change default from 6/10 to 7/10 | Fewer manual overrides |
| 2 | experiment-bridge retry count | 40% of runs hit max retries on OOM | Add OOM-specific recovery (reduce batch size) | Fewer failed experiments |
| 3 | paper-write de-AI patterns | Users manually fix "delve" in 80% of runs | Add "delve" to default watchword list | Fewer manual edits |

--- a/skills/auto-review-loop/SKILL.md
+++ b/skills/auto-review-loop/SKILL.md
@@ -15,7 +15,7 @@
 ## Constants
 
-- **SCORE_THRESHOLD = 6** — Minimum review score to accept.
+- **SCORE_THRESHOLD = 7** — Minimum review score to accept. (Raised based on usage data: 60% of users overrode to 7+.)

mcp__codex__codex:
  model: gpt-5.4
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    You are reviewing a proposed optimization to an ARIS SKILL.md file.
    
    ## Original Skill (relevant section)
    [paste original]
    
    ## Proposed Patch
    [paste diff]
    
    ## Evidence from Usage Log
    [paste summary stats]
    
    Review this patch:
    1. Does the evidence support the change?
    2. Could this change hurt other use cases?
    3. Is the change minimal and safe?
    4. Score 1-10: should this be applied?
    
    If score < 7, explain what additional evidence would be needed.

# ARIS Meta-Optimization Report

**Date**: [today]
**Data**: [N] events, [M] skill invocations, [K] sessions
**Target**: [skill name or "all"]

## Proposed Changes

### Change 1: [title]
- **Target**: [skill/file:line]
- **Signal**: [what the data shows]
- **Patch**: [diff]
- **Reviewer Score**: [X/10]
- **Reviewer Notes**: [summary]
- **Status**: ✅ Recommended / ⚠️ Needs more data / ❌ Rejected

### Change 2: ...

## Changes NOT Made (insufficient evidence)
- [pattern observed but too few samples]

## Recommendations
- [ ] Apply Change 1 (reviewer approved)
- [ ] Collect more data for Change 3 (need N more runs)
- [ ] Consider manual review of Change 2

## Next Steps
Run `/meta-optimize apply 1` to apply a specific change, or
`/meta-optimize apply all` to apply all recommended changes.

{"ts":"...","session":"...","event":"skill_invoke","skill":"auto-review-loop","args":"difficulty: hard"}
{"ts":"...","session":"...","event":"PostToolUse","tool":"Bash","input_summary":"pdflatex main.tex"}
{"ts":"...","session":"...","event":"codex_call","tool":"mcp__codex__codex","input_summary":"review..."}
{"ts":"...","session":"...","event":"tool_failure","tool":"Bash","input_summary":"python train.py"}
{"ts":"...","session":"...","event":"slash_command","command":"/auto-review-loop","args":""}
{"ts":"...","session":"...","event":"user_prompt","prompt_preview":"change difficulty to hard"}
{"ts":"...","session":"...","event":"session_start","source":"startup","model":"claude-opus-4-6"}
{"ts":"...","session":"...","event":"session_end"}

Passive logging (always on): Claude Code hooks record events to .aris/meta/events.jsonl automatically during normal usage. Zero user effort.
Automatic readiness check (SessionEnd hook): When a Claude Code session ends, check_ready.sh counts skill invocations since the last /meta-optimize run. If ≥5 new invocations have accumulated, it prints a reminder:
```
📊 ARIS has logged 8 skill runs since last optimization. Run /meta-optimize to check for improvement opportunities.
```
This is a suggestion only — it does not auto-run optimization.
Manual trigger: User runs /meta-optimize when they see the reminder or whenever they want.

Meta Optimize

Meta-Optimize: Outer-Loop Harness Optimization for ARIS

Context

What This Skill Optimizes (Harness Components)

Meta Optimize

Meta-Optimize: Outer-Loop Harness Optimization for ARIS

Context

What This Skill Optimizes (Harness Components)

Prerequisites

Workflow

Step 0: Check Data Availability

Step 1: Analyze Usage Patterns

Step 2: Identify Optimization Targets

Step 3: Generate Patch Proposals

Step 4: Cross-Model Review of Patches

Step 5: Present Results

Step 6: Apply Changes (if user approves)

Key Rules

Event Schema Reference

Triggering

Acknowledgements

Output Protocols

Review Tracing

Things Mac

Trello

Production Scheduling

Jira Integration

Production Scheduling

Cost Aware Llm Pipeline