Blameless postmortem writing — timeline, root cause analysis, corrective actions. Use when writing an incident postmortem, conducting a post-incident review, building a postmortem template, or coaching a team toward blameless culture.
A good postmortem is a learning artifact, not a punishment. It tells the story of an incident factually, traces the root cause to a systemic gap (not a person), and produces a small number of owned, time-bound corrective actions.
Every postmortem should include, in order: title and metadata, summary, impact, timeline, root cause analysis, contributing factors, what went well, corrective actions, and lessons learned. The order matters — facts first, analysis second, actions last.
Use UTC timestamps. Format each line as
HH:MM UTC — Event description (who/what). Include first alert,
escalation points, key diagnostic steps, mitigations applied, and
full resolution. Separate detection time (first signal) from response
time (first human action) and call out delays explicitly. The
timeline is factual, not interpretive — analysis goes in the root
cause section.
Start with the observable failure and ask "why?" iteratively. Stop when you reach a systemic or process issue, not a human mistake. If the chain branches, document all branches. Example:
The root cause is "no migration performance review", not "Alice wrote a slow migration."
Each action must be specific, owned, and have a due date. Categorize
as prevent (stop it from happening), detect (catch it
faster), or mitigate (reduce impact). Format:
[P/D/M] Action — @owner — due YYYY-MM-DD. Cap at 5–8 actions; more
means none get done. Aim for at least one action per category. Track
completion in the team's issue tracker, not just the document.
Focus on systems and processes, not individuals. Use passive voice for human errors ("the config was deployed without review", not "Alice deployed without review"). Ask "what about our system made this mistake easy to make?" Assume everyone acted with the best information available at the time. If anyone feels blamed, the postmortem has failed regardless of wording.
Agent-specific failure modes — provider-neutral pause-and-self-check items:
# Postmortem: [Incident Title]
**Date:** YYYY-MM-DD
**Severity:** SEV-1 / SEV-2 / SEV-3
**Duration:** HH:MM (from detection to resolution)
**Author:** [Name]
**Reviewers:** [Names]
## Summary
[2-3 sentences: what happened, impact, resolution]
## Impact
- Users affected: [number or percentage]
- Duration of user impact: [time]
- Revenue impact: [estimate if applicable]
- SLA budget consumed: [percentage]
## Timeline (UTC)
- HH:MM — [First signal / alert fired]
- HH:MM — [Escalation / response began]
- HH:MM — [Key diagnostic finding]
- HH:MM — [Mitigation applied]
- HH:MM — [Full resolution confirmed]
## Root Cause Analysis
### 5 Whys
1. Why? ...
2. Why? ...
3. Why? ...
4. Why? ...
5. Why? ...
## Contributing Factors
- [Factor that made it worse or delayed resolution]
## What Went Well
- [Thing that worked during response]
## Corrective Actions
| Type | Action | Owner | Due Date | Status |
|------|--------|-------|----------|--------|
| Prevent | [Action] | @name | YYYY-MM-DD | Open |
| Detect | [Action] | @name | YYYY-MM-DD | Open |
| Mitigate | [Action] | @name | YYYY-MM-DD | Open |
## Lessons Learned
- [Key takeaway]