Name: Incident Triage
Author: testified-oss

Buscar habilidades.../

Incident Triage | Skills Pool

Severity	Impact	Examples	Response Time
SEV-1 (Critical)	Complete outage, data loss, security breach	Service down for all users, data corruption, unauthorized access	Immediate (< 15 min)
SEV-2 (High)	Major feature broken, significant user impact	Payment processing failing, login broken for subset	< 30 minutes
SEV-3 (Medium)	Degraded performance, workaround exists	Slow response times, intermittent errors	< 2 hours
SEV-4 (Low)	Minor issue, minimal user impact	UI glitch, non-critical feature affected	< 24 hours

## Impact Analysis

### Affected Components
- [ ] Frontend/UI
- [ ] API/Backend
- [ ] Database
- [ ] Third-party integrations
- [ ] Infrastructure

### User Impact
- **Users Affected:** [number/percentage]
- **Regions Affected:** [list regions]
- **Customer Tiers:** [enterprise/pro/free]

### Business Impact
- **Revenue at Risk:** $[amount]/hour
- **SLA Status:** [within/breaching]
- **Reputational Risk:** [low/medium/high]

## Root Cause Hypotheses

### Hypothesis 1: [Most Likely]
- **Theory:** [description]
- **Supporting Evidence:** [what points to this]
- **Contradicting Evidence:** [what doesn't fit]
- **Confidence:** [high/medium/low]
- **Investigation Steps:**
  1. [step 1]
  2. [step 2]

### Hypothesis 2: [Alternative]
- **Theory:** [description]
- **Supporting Evidence:** [evidence]
- **Confidence:** [level]

Category	Indicators	First Steps
Deployment	Recent deploy, gradual rollout issues	Check deploy logs, rollback
Capacity	High CPU/memory, request timeouts	Scale up, check autoscaling
Dependency	External service errors, timeout patterns	Check status pages, circuit breakers
Data	Query errors, migration issues	Check DB logs, recent migrations
Configuration	Feature flag changes, config updates	Review recent config changes
Network	DNS issues, connectivity problems	Check network status, DNS resolution
Security	Auth failures, rate limiting	Review auth logs, check for attacks

## Escalation Path

### Immediate Notifications
- [ ] On-call engineer: [name/team]
- [ ] Team lead: [name]
- [ ] Stakeholders: [list]

### Escalation Triggers
- If not resolved in [X] minutes → Escalate to [team/person]
- If data integrity confirmed → Escalate to [security/data team]
- If customer-facing SLA breach → Notify [customer success]

### Communication Plan
- **Internal:** [Slack channel, frequency]
- **External:** [Status page update needed? Customer comms?]

Severity	Primary	Backup	Management	Customer Comms
SEV-1	On-call + Team Lead	Director	VP within 30 min	Immediate
SEV-2	On-call	Team Lead	Director if > 1hr	If > 30 min
SEV-3	On-call	N/A	Daily standup	If requested
SEV-4	Ticket owner	On-call	N/A	N/A

# Incident Triage Report

## Summary
- **Incident ID:** [INC-XXXX]
- **Severity:** [SEV-1/2/3/4]
- **Status:** [investigating/identified/monitoring/resolved]
- **Started:** [timestamp]
- **Duration:** [ongoing/X hours]

## What Happened
[Brief description of the incident]

## Impact
- **Users Affected:** [count/percentage]
- **Services Affected:** [list]
- **Business Impact:** [description]

## Root Cause
- **Confirmed/Suspected:** [status]
- **Category:** [deployment/capacity/dependency/etc.]
- **Description:** [detailed explanation]

## Timeline
| Time | Event |
|------|-------|
| HH:MM | Incident detected |
| HH:MM | Triage started |
| HH:MM | Root cause identified |
| HH:MM | Mitigation applied |

## Actions Taken
1. [Action 1]
2. [Action 2]

## Escalations
- [Who was notified and when]

## Next Steps
- [ ] [Immediate action]
- [ ] [Follow-up action]
- [ ] [Post-incident review scheduled]

Incident Triage

When to Use

When NOT to Use

Triage Process

1. Initial Assessment (First 5 minutes)

Incident Triage

When to Use

When NOT to Use

Triage Process

1. Initial Assessment (First 5 minutes)

2. Severity Assessment

Severity Decision Factors

3. Impact Analysis

4. Root Cause Hypothesis

Common Root Cause Categories

5. Escalation Recommendations

Escalation Matrix

Triage Report Template

Quick Triage Checklist

Best Practices

Session Logs

OpenClaw Test Heap Leaks

Node Connect

Openclaw Qa Testing

Openclaw Secret Scanning Maintainer

Flags