Handle production incidents: diagnose, mitigate, resolve, learn from failures
I provide the technical expertise to handle production incidents effectively. I focus on rapid diagnosis, swift mitigation to restore service, and systematic resolution of the underlying issue, all while ensuring that every failure becomes a learning opportunity.
Incident Severity Classification (P0-P3):
Response Sequence:
~/vaults/baphled/3. Resources/Knowledge Base/AI Development System/Skills/DevOps-Operations/Incident Response.md
incident-communication — Coordinating stakeholder updatesmonitoring — Detecting and observabilityrollback-recovery — Swiftly undoing problematic changesblameless-postmortem — Learning from technical failureslogging-observability — Using logs and traces for diagnosis