Use when you need to challenge research assumptions or stress-test arguments.
Challenge research assumptions and identify weaknesses in your arguments.
Based on Scott Cunningham's Part 3: "Creating Devil's Advocate Agents for Tough Problems" - addressing the "LLM thing of over-confidence in diagnosing a problem."
For formal code audits with replication scripts and referee reports, use the Referee 2 agent instead (.claude/agents/referee2-reviewer.md). This skill is for quick adversarial feedback on arguments, not systematic audits.
references/competing-hypotheses.md and generate 3-5 rival explanations before critiquingInspired by the simulated scientific debates in Google's AI Co-Scientist. A one-shot critique is easy for an LLM to produce but often superficial. Multi-turn debates force each critique to survive a defense, filtering out weak objections and sharpening the strong ones.
Adopt the persona of a hostile but competent reviewer. Challenge on:
Produce numbered critiques (aim for 5-8), each with a concrete statement of the problem.
Switch persona to the paper's author. For each numbered critique, provide the strongest possible defense:
Switch to an impartial senior reviewer. For each critique-defense pair, rule:
Produce a structured report with only the surviving critiques (stands + partially addressed), ranked by severity:
## Devil's Advocate Report
### Critical (must fix before submission)
1. [Critique] — [Why the defense failed] — [Suggested fix]
### Major (reviewers will likely raise)
2. [Critique] — [What remains after defense] — [Suggested fix]
### Minor (worth acknowledging)
3. [Critique] — [Residual concern] — [How to preempt]
### Dismissed
- [Critiques that were resolved in Round 2, listed briefly for transparency]
For quick checks (e.g., "just poke holes in this argument"), skip the multi-turn protocol and produce a direct critique. Use when the user says "quick", "just challenge this", or the input is a paragraph rather than a full paper.
"Play devil's advocate on my research paper about preference drift - specifically challenge my identification strategy and the assumptions about utility functions."
For the highest-stakes arguments, run the devil's advocate debate across multiple LLM providers. Different models have genuinely different reasoning patterns — a critique that Claude finds weak, GPT may find devastating, and vice versa. This produces adversarial tension that a single model cannot replicate internally.
Trigger: "Council devil's advocate" or "thorough challenge"
How it works:
Invocation (CLI backend):
cd packages/cli-council
uv run python -m cli_council \
--prompt-file /tmp/devils-advocate-prompt.txt \
--context-file /tmp/paper-content.txt \
--output-md /tmp/devils-advocate-council.md \
--chairman claude \
--timeout 180
See skills/shared/council-protocol.md for the full orchestration protocol.
Value: High — the multi-turn debate protocol (Round 1→2→3) becomes genuinely adversarial when different models play different roles. A critique that survives cross-model scrutiny is almost certainly a real weakness.
| Skill | When to use instead/alongside |
|---|---|
/interview-me | To develop the idea further through structured interview |
/multi-perspective | For multi-perspective analysis with disciplinary diversity |
/proofread | For language/formatting review rather than argument critique |