技能檔案

Multi-Model Review

Name: Multi-Model Review
Author: dhk

Orchestrates a multi-model review workflow using Claude as orchestrator and Codex (via MCP) as a second-opinion reviewer. Produces adversarial, constructive, or debate-style critiques with a synthesis that highlights where the models diverge — which is where the real signal is. Use this skill whenever the user says anything like "review this with Codex", "get a second opinion", "red team this", "have Codex critique this", "multi-model review", "ask Codex what it thinks", "steelman this proposal", "debate this", or "review this code with another model". Triggers any time the user wants two AI perspectives on a proposal, draft, design, or codebase.

dhk0 星標2026年4月5日

職業
分類: 除錯

技能內容

Uses Claude as orchestrator and Codex as a second-opinion reviewer via MCP. The value is in the divergence — where models disagree is where you should investigate.

Prerequisites

Codex must be registered as an MCP server. Verify with:

claude mcp list

If codex is not listed, ask the user to run:

claude mcp add codex -- npx codex mcp-server

Choosing a Workflow

Ask the user which review mode they want, or infer from context:

Mode	Use when
`/review-redteam`	Proposal, plan, design doc — find what breaks

相關技能

Multi-Model Review | Skills Pool

/review-steelman

[HIGH] <finding title>
Why it fails: ...
Consequence if ignored: ...

You are an adversarial reviewer. Your job is to find problems, not validate.

For the following content, identify:
- At least 5 specific failure modes or weaknesses (label each High/Med/Low severity)
- Unstated assumptions that could break this
- The single most critical issue that must be addressed

Do not praise or validate. Only critique.

CONTENT:
{content}

You are a constructive strategic reviewer. Assume this proposal is directionally correct.

Identify:
- The core insight that makes this valuable (one sentence)
- 3 concrete ways to make this significantly stronger
- The single condition that, if true, makes this succeed

CONTENT:
{content}

You are arguing AGAINST the following proposal. Find every reason it fails.
Be specific. Be relentless. Do not hedge.

PROPOSAL:
{content}

Review the following code. Focus specifically on:
- Security vulnerabilities
- Performance bottlenecks
- Missing error handling or edge cases
- Anything that would fail in production

Do not comment on style. Only flag substantive issues with severity (Critical/High/Med/Low).

CODE:
{content}

Assume the proposal is wrong. What breaks first?

## Claude's [Mode] Review
[Claude's full findings, unmerged]

---

## Codex's [Mode] Review
[Codex's full findings, verbatim or closely paraphrased — do not filter]

---

## Synthesis

CONSENSUS (both flagged):
- ...

DIVERGENT — Claude only:
- ...

DIVERGENT — Codex only:
- ...

THE CRUX: [the single most important unresolved question]

RECOMMENDED NEXT STEP: [one concrete action]

Signal	Noise
Claude flags X, Codex doesn't	Both say "looks good"
Codex raises issue Claude missed	Both list the same generic risks
Models disagree on severity	One model echoes the other's framing

Multi-Model Review

Prerequisites

Choosing a Workflow

Multi-Model Review

Prerequisites

Choosing a Workflow

Step 1: Run Both Reviews in Parallel

Red Team prompt (Claude's own)

Steelman prompt (Claude's own)

Debate prompt (Claude's own)

Code Review prompt (Claude's own)

Step 2: Codex Prompt (runs in parallel with Step 1)

Red Team prompt for Codex

Steelman prompt for Codex

Debate prompt for Codex

Code Review prompt for Codex

Step 3: Show Raw Outputs, Then Synthesize

Signal vs. Noise

Edge Cases

Session Logs

OpenClaw Test Heap Leaks

Node Connect

Openclaw Qa Testing

Openclaw Secret Scanning Maintainer

Flags