Use when reviewing code and PRs created by Codegen agents — two-stage review process. Stage 1 checks spec compliance (did the agent do what was asked?). Stage 2 checks code quality (is the code good?). Triggers after agent runs complete.
Agent-created code requires verification before merging. Trust but verify — agents complete most tasks correctly, but edge cases, misunderstandings, and quality issues slip through.
Iron Law: NO MERGE WITHOUT REVIEW.
Two-stage process:
Announce at start: "I'm using the reviewing-agent-output skill to review this agent's work."
completed status with a PRexecuting-via-codegen workflow between tasksBefore reviewing, gather the context:
codegen_get_run(run_id)codegen_get_logs(run_id, limit=30, reverse=false) — chronological activityQuestion: Did the agent do what was asked?
From logs:
Bash calls with test commands → check output for PASS/FAILEdit/Write calls → verify correct files were modifiedgh pr create → verify PR was createdFrom run result:
pull_requests array for PR linkssummary for agent's own description of what it did| Verdict | Meaning | Action |
|---|---|---|
| PASS | All requirements met | Proceed to Stage 2 |
| PARTIAL | Some requirements met, some missing | Resume agent with specific missing items |
| FAIL | Wrong approach or major misunderstanding | Create new run with clarified prompt |
| UNCLEAR | Can't determine from logs alone | Need to read the actual PR diff |
If PARTIAL: codegen_resume_run(run_id, prompt="You missed: [specific items]. Please complete them.")
If FAIL: Use debugging-failed-runs skill to diagnose, then prompt-crafting for a better prompt.
Question: Is the code good enough to merge?
Only proceed to Stage 2 if Stage 1 passed.
console.log, no commented-out code, no TODOsFrom logs (quick check):
console.log in Edit tool inputsFrom PR (thorough check):
gh pr diff <number> locally if repo is available| Verdict | Meaning | Action |
|---|---|---|
| APPROVE | Code is good to merge | Report to user, suggest merge |
| MINOR | Small issues, fixable by agent | Resume with specific fixes needed |
| MAJOR | Significant quality issues | New run with more guidance, or fix locally |
| REJECT | Fundamentally wrong approach | Close PR, rethink the task |
If MINOR: codegen_resume_run(run_id, prompt="Quality issues to fix: [list specific issues]")
If MAJOR: Consider whether a new run or local fix is more efficient.
Present review results to the user as:
Agent Output Review — Run #<id>
================================
Stage 1: Spec Compliance — [PASS/PARTIAL/FAIL]
[✓] Task completed
[✓] Correct files modified
[✓] Tests written and passing
[✓] PR created: <link>
[✗] Missing: <what's missing, if any>
Stage 2: Code Quality — [APPROVE/MINOR/MAJOR/REJECT]
[✓] No obvious bugs
[✓] Error handling present
[✓] Consistent style
[!] Minor: <issue description>
Recommendation: [Merge / Fix then merge / Rework / Reject]
When reviewing as part of executing-via-codegen (between tasks):
The goal is efficiency: don't spend 5 minutes reviewing every trivial task, but don't skip review for complex or risky tasks.
| Task Type | Review Depth | Why |
|---|---|---|
| Add simple model/schema | Stage 1 only | Low risk, easily verified |
| Add API endpoint with auth | Both stages | Security-sensitive |
| Refactor across files | Both stages | High risk of breaking things |
| Add/update tests only | Stage 1 only | Tests verify themselves |
| Database migration | Both stages | Hard to undo |
| UI component | Stage 1 + visual check | Need to see the output |
If superpowers requesting-code-review skill is available:
For routine reviews, this skill's two-stage process is sufficient and faster.