Before starting ANY significant task (feature build, refactor, integration, migration, or architectural change), first imagine the project has failed. Generate 3-5 specific failure scenarios, assess risk levels, identify mitigations, then adjust the implementation plan to address high-risk items FIRST. Do not start coding until the pre-mortem is acknowledged.
Gary Klein's research on prospective hindsight shows teams who imagine failure identify 30% more risks than teams who just "plan carefully." The mental shift from "how do we succeed?" to "why did we fail?" unlocks different thinking.
Do NOT start writing code. First, run analysis:
python scripts/analyse_risk.py --task "<task description>" --path .
<failure-categories>
Common failure patterns to consider:
<severity>
**HIGH** - Must mitigate before proceeding:
- Data loss or corruption possible
- Security vulnerability
- Money/billing affected
- No rollback path
- Silent failure (looks fine, actually broken)
<failed-attempts>
What DOESN'T work:
</failed-attempts>Or manually assess by examining:
Imagine it's 2 weeks later. The task failed. Ask: "Why did it fail?"
Generate 3-5 specific failure scenarios. For each:
| Component | Description |
|---|---|
| Scenario | Concrete failure description |
| Likelihood | HIGH / MEDIUM / LOW |
| Impact | What breaks if this happens |
| Detection | How would we know it failed |
| Mitigation | How to prevent or reduce risk |
Integration failures:
Data failures:
Logic failures:
Operational failures:
Format output as:
Pre-mortem Analysis: [Task Name]
If this fails in 2 weeks, it's probably because:
1. HIGH: [Scenario]
Impact: [What breaks]
Mitigation: [How to prevent]
2. HIGH: [Scenario]
Impact: [What breaks]
Mitigation: [How to prevent]
3. MED: [Scenario]
Impact: [What breaks]
Mitigation: [How to prevent]
4. MED: [Scenario]
Impact: [What breaks]
Mitigation: [How to prevent]
5. LOW: [Scenario]
Impact: [What breaks]
Mitigation: [How to prevent]
Adjusted Plan:
[Reordered implementation addressing HIGH risks first]
Proceed with this plan?
Reorder the implementation plan to:
Do NOT proceed until user confirms. They may:
MEDIUM - Mitigate or explicitly accept risk:
LOW - Note but proceed:
User: "Add Stripe payment integration"
Pre-mortem Analysis: Stripe Payment Integration
If this fails in 2 weeks, it's probably because:
1. HIGH: Webhook signature validation wrong
Impact: Payments process but our system never knows - orders stuck in limbo
Detection: Customer complaints about "paid but not delivered"
Mitigation: Test webhooks with Stripe CLI locally before writing business logic
2. HIGH: No idempotency keys
Impact: Network retry causes duplicate charges
Detection: Customer charged twice, refund requests
Mitigation: Implement idempotency keys from the start, not retrofitted
3. HIGH: Test vs live key confusion
Impact: Real charges in development, or test mode in production
Detection: Angry customers or "payments not working" in prod
Mitigation: Env validation on startup, assert key prefix matches environment
4. MED: Missing error states in UI
Impact: User sees spinner forever, retries, double charge
Detection: Support tickets, rage clicks in analytics
Mitigation: Design loading/error/success states before API integration
5. MED: No payment logging
Impact: Can't debug failed payments, refund disputes unwinnable
Detection: "We have no record of this" situations
Mitigation: Log every payment event with correlation ID
Adjusted Plan:
1. Set up Stripe CLI + webhook endpoint (test signature validation)
2. Implement idempotency key generation
3. Add environment validation (key prefix checks)
4. Build payment service with full logging
5. Create UI states (loading/error/success)
6. Then: actual payment flow implementation
7. End-to-end test with Stripe test cards
Proceed with this plan?
User: "Add soft delete to all models"
Pre-mortem Analysis: Soft Delete Migration
If this fails in 2 weeks, it's probably because:
1. HIGH: Existing queries return "deleted" records
Impact: Users see deleted data, privacy/legal issues
Detection: "Why is this showing up? I deleted it"
Mitigation: Add default scope, audit ALL existing queries before migration
2. HIGH: Unique constraints now broken
Impact: Can't recreate record with same email/username after soft delete
Detection: "Email already exists" on new signup
Mitigation: Make unique constraints exclude deleted records, or use different approach
3. HIGH: No rollback path
Impact: If migration fails midway, data integrity unknown
Detection: 500 errors, inconsistent state
Mitigation: Test on production clone first, have restore procedure ready
4. MED: Cascade deletes still hard-delete related records
Impact: Parent soft-deleted but children gone forever
Detection: Missing related data after restore
Mitigation: Audit all ON DELETE CASCADE, convert to soft delete
5. MED: Analytics/reporting now wrong
Impact: Counts include deleted records, metrics inflated
Detection: "Numbers don't match the UI"
Mitigation: Update all aggregate queries to filter deleted
Adjusted Plan:
1. Audit all existing queries (grep for affected tables)
2. Audit unique constraints and cascades
3. Design soft delete schema (deleted_at column approach)
4. Create migration with rollback script
5. Test on staging with production data clone
6. Update queries one-by-one with tests
7. Then: deploy migration
8. Monitor for "deleted showing" errors
Proceed with this plan?
User: "Fix the typo in the error message"
This is a trivial change (single string edit). Pre-mortem not needed.
Proceeding directly.