Audit decisions for judgment quality, compliance bias, and manipulation vulnerability. Inspired by Anthropic's Project Vend Phase 2 finding that helpfulness training creates exploitable attack surface.
Close the judgment gap in the self-improvement loop. Pattern-extractor catches technical errors ("did it work?"). Decision-review catches reasoning failures ("should we have done this at all?").
Anthropic's Project Vend Phase 2 (2026) demonstrated that an AI agent running autonomous businesses was exploited primarily through its eagerness to please — not through technical failures. The agent's helpfulness bias was the attack surface. This skill prevents the same failure mode in our autonomous operation.
For every significant decision made in a session, ask:
/decision-review — Review decisions from current session/decision-review --last 7 — Review decisions from last 7 days/decision-review --session <id> — Review specific sessionList every significant decision made in the session:
| # | Decision | Type | External? | Reversible? |
|---|----------|------|-----------|-------------|
| 1 | Deployed to production | Action | Yes | Partially |
| 2 | Chose React over Svelte | Architecture | No | Yes |
| 3 | Agreed to skip tests | Compliance | No | Yes |
Types:
For each decision, check for red flags:
Compliance Red Flags (Vend failure pattern):
Judgment Red Flags:
Scope Red Flags:
Score each decision:
| Score | Meaning |
|---|---|
| Green | Sound judgment — premise verified, alternatives considered, appropriate scope |
| Yellow | Adequate — correct outcome but reasoning could be stronger |
| Orange | Concerning — complied without sufficient verification, got lucky |
| Red | Failed — acted on unverified premise, didn't question, or was manipulated |
For any Orange or Red decisions:
### Decision Review Finding
**Decision**: [What was decided]
**Score**: Orange/Red
**What happened**: [Description]
**What should have happened**: [Better approach]
**Root cause**: [Why judgment failed — compliance bias? time pressure? authority deference?]
**Prevention**: [Specific check that would catch this next time]
**Add to pre-flight?**: Yes/No
If a judgment failure pattern appears 3+ times:
# Decision Review: [Date]
## Session Summary
- Decisions made: N
- External actions: N
- Scores: X green, Y yellow, Z orange, W red
## Decision Inventory
[Table from Phase 1]
## Red Flags Found
[Details from Phase 2]
## Findings
[From Phase 4 — only Orange/Red decisions]
## Patterns Emerging
[From Phase 5 — recurring judgment failures]
## Actions Taken
- [ ] Updated PRE_FLIGHT.md with: [specific check]
- [ ] Updated MEMORY.md with: [specific pattern]
- [ ] Logged to daily notes
Decision-review generates findings. Pattern-extractor should include a new compliance dimension:
Orange/Red findings generate new pre-flight checks in the "Before External Actions" section.
Decision review output appended to memory/daily/YYYY-MM-DD.md under a ## Decision Review section.
Findings feed the adversarial-reviewer skill's knowledge base of known exploitation patterns.
"The system's training to be helpful became a liability, prioritizing friendliness over sound business decisions." — Anthropic, Project Vend Phase 2
The goal is not to become unhelpful. It's to distinguish between:
A truly helpful agent sometimes says "no" or "wait, let me verify that first."
Created: 2026-02-07 Inspired by: Anthropic Project Vend Phase 2 (2026) Related Skills: pattern-extractor, adversarial-reviewer, memory-manager Status: Active