Name: Agent Evaluation Framework
Author: v8

Agent Evaluation Framework Workflow

Use this skill to orchestrate evaluation sessions for subagents, identify procedural bottlenecks, and iteratively refine system prompts and capabilities utilizing Swarm intelligence principles.

0. Preparation

Subagent Isolation: Ensure that subagents spawned for evaluation do NOT utilize existing session brains or previous task knowledge. This is critical to maintain the integrity of meta-testing.
Worktree Pre-creation: Create isolated git worktrees using v8-utils worktree tool for each test case beforehand. Worktrees MUST be subdirectories of the V8 repository (e.g., in a worktrees/ directory within the V8 root). Report where the worktrees were created to the user.
Test Injection: Copy the target test case into the worktree (e.g., test/mjsunit/repro.js).
Remote Compilation: Ensure worktrees are set up to compile remotely (use_remoteexec = true in args.gn) before proceeding.

Agent Evaluation Framework Workflow

Use this skill to orchestrate evaluation sessions for subagents, identify procedural bottlenecks, and iteratively refine system prompts and capabilities utilizing Swarm intelligence principles.

0. Preparation

Subagent Isolation: Ensure that subagents spawned for evaluation do NOT utilize existing session brains or previous task knowledge. This is critical to maintain the integrity of meta-testing.
Worktree Pre-creation: Create isolated git worktrees using v8-utils worktree tool for each test case beforehand. Worktrees MUST be subdirectories of the V8 repository (e.g., in a worktrees/ directory within the V8 root). Report where the worktrees were created to the user.
Test Injection: Copy the target test case into the worktree (e.g., test/mjsunit/repro.js).
Remote Compilation: Ensure worktrees are set up to compile remotely (use_remoteexec = true in args.gn) before proceeding.

Agent Evaluation Framework

Agent Evaluation Framework Workflow

0. Preparation

Agent Evaluation Framework

Agent Evaluation Framework Workflow

0. Preparation

1. Core Directives

2. Agent Orchestration & Lifecycle Management

3. Evaluation & Divergence Analysis

4. Iterative Process Refinement & Skepticism

5. Tooling

Session Logs

OpenClaw Test Heap Leaks

Node Connect

Openclaw Qa Testing

Openclaw Secret Scanning Maintainer

Flags