Guide incident response — detect, assess, mitigate, root cause, prevent recurrence.
Respond to $ARGUMENTS.
The cardinal rule: mitigate first, root-cause second. The goal is to stop the bleeding before you diagnose the disease. Never spend 30 minutes investigating while users are down.
Gather the symptoms (2 minutes max):
Classify severity:
| Severity | Criteria | Response time | Communication cadence |
|---|---|---|---|
| SEV-1 (Critical) | Service down, data loss, security breach, revenue impact | Immediate |
| Every 15 minutes |
| SEV-2 (High) | Major feature degraded, affecting many users, no workaround | < 30 min | Every 30 minutes |
| SEV-3 (Medium) | Feature degraded, workaround exists, limited user impact | < 2 hours | Every 2 hours |
| SEV-4 (Low) | Minor issue, cosmetic, single user affected | Next business day | Resolution only |
Rules:
Answer these questions before taking action:
Build a timeline (MANDATORY):
HH:MM UTC — [Event] — [Source of information]