Incident Response

Production incident handling — triage, communicate, mitigate, fix, postmortem. Use when production is broken, users are affected, an outage is in progress, or a deploy has caused regressions and the priority is restoring service.

職業
カテゴリ: 営業・マーケティング

Intro

When production breaks, restore service first and understand it later. Triage user impact, communicate early, mitigate with the fastest safe action (often a rollback), then fix the root cause and write a blameless postmortem within 48 hours.

Overview

Triage in the first five minutes

Answer three questions before doing anything else: what is the user impact (total outage, degraded, cosmetic), what changed recently (deploy, config flip, dependency update), and is there a quick rollback option. The answers decide whether you mitigate by reverting or by patching forward.

Communicate early and often

Notify stakeholders immediately with what is broken, who is affected, and an ETA to next update — not an ETA to fix. Post to the status page or incident channel. Re-post on a regular cadence even if there is no progress; silence is worse than bad news.

Mitigate before fixing

Mitigation restores service; the fix comes later. Roll back if it is safe and quick. Apply a workaround (feature flag off, traffic shed, cache bypass) to stop the bleeding. Do not debug in production if you can reproduce the issue elsewhere.

Severity	Definition	Response
SEV-1	Total outage or data loss	All-hands, immediate
SEV-2	Major feature down or large user subset affected	On-call + manager
SEV-3	Degraded performance, workaround exists	On-call only
SEV-4	Cosmetic or low-impact bug	Normal triage

Incident Response

Intro

Overview

Triage in the first five minutes

Communicate early and often

Mitigate before fixing

Incident Response

Intro

Overview

Triage in the first five minutes

Communicate early and often

Mitigate before fixing

Fix and verify

Postmortem within 48 hours

Example

Gotchas

Full reference

Severity levels

Roles during an incident

Anti-patterns

Taskflow Inbox Triage

Accessibility

Open a Pull Request

Investor Materials

Continuous Agent Loop

Configure Ecc