Network Partition Recovery

Use when performing network partition recovery — runbook for diagnosing and recovering from network partition events across distributed systems. Covers partition detection, impact assessment, split-brain resolution, data reconciliation, connectivity restoration, and post-recovery validation to restore full cluster consistency.

Occupation
Categories: Project Management

Network Partition Recovery Skill

Recover from {{ partition_type }} affecting {{ affected_systems }}.

Workflow

Phase 1 — Partition Detection and Scoping

PARTITION ASSESSMENT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[ ] Partition detected — timestamp: ___
[ ] Affected systems: {{ affected_systems }}
[ ] Partition type: {{ partition_type }}
[ ] Scope:
    - Nodes/services on side A: ___
    - Nodes/services on side B: ___
    - Fully isolated nodes: ___
[ ] Impact assessment:
    - Services degraded: ___
    - Services fully unavailable: ___
    - Users affected (estimated): ___

Phase 2 — Split-Brain Assessment

SPLIT-BRAIN CHECK
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[ ] Determine if split-brain has occurred:
    - Multiple leaders elected: [ ] YES  [ ] NO
    - Divergent writes detected: [ ] YES  [ ] NO
    - Quorum status:
      Side A: ___ nodes (quorum: [ ] YES  [ ] NO)
      Side B: ___ nodes (quorum: [ ] YES  [ ] NO)

DECISION MATRIX
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Scenario              | Action
No split-brain        | Restore connectivity, verify
Split-brain, one side | Fence minority side, restore
  has quorum          |
Split-brain, no       | Manual intervention, pick
  quorum either side  | canonical side
Divergent writes      | Data reconciliation required

Network Partition Recovery

Occupation
Categories: Project Management

Workflow

Phase 1 — Partition Detection and Scoping

PARTITION ASSESSMENT ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ [ ] Partition detected — timestamp: ___ [ ] Affected systems: {{ affected_systems }} [ ] Partition type: {{ partition_type }} [ ] Scope: - Nodes/services on side A: ___ - Nodes/services on side B: ___ - Fully isolated nodes: ___ [ ] Impact assessment: - Services degraded: ___ - Services fully unavailable: ___ - Users affected (estimated): ___

Phase 2 — Split-Brain Assessment

SPLIT-BRAIN CHECK ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ [ ] Determine if split-brain has occurred: - Multiple leaders elected: [ ] YES [ ] NO - Divergent writes detected: [ ] YES [ ] NO - Quorum status: Side A: ___ nodes (quorum: [ ] YES [ ] NO) Side B: ___ nodes (quorum: [ ] YES [ ] NO) DECISION MATRIX ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Scenario | Action No split-brain | Restore connectivity, verify Split-brain, one side | Fence minority side, restore has quorum | Split-brain, no | Manual intervention, pick quorum either side | canonical side Divergent writes | Data reconciliation required

Shortcut	Counter	Why
"We can skip some steps for this case"	Adapt the workflow steps, don't skip them	Skipped steps are where incidents and oversights originate
"The user seems to already know what to do"	Complete all workflow phases with the user	The workflow catches blind spots that experience alone misses
"This is a minor case, full process is overkill"	Scale the process down, don't turn it off	Minor cases become major when unstructured; the process scales, not disappears
"I'll fill in the details later"	Complete each section before moving on	Deferred details are forgotten; real-time capture is more accurate
"The template output isn't necessary"	Always produce the structured output format	Structured output enables comparison, audit trails, and handoff to other teams

Network Partition Recovery

Network Partition Recovery Skill

Workflow

Phase 1 — Partition Detection and Scoping

Phase 2 — Split-Brain Assessment

Network Partition Recovery

Network Partition Recovery Skill

Workflow

Phase 1 — Partition Detection and Scoping

Phase 2 — Split-Brain Assessment

Phase 3 — Connectivity Restoration

Phase 4 — Data Reconciliation

Phase 5 — Post-Recovery Validation

Counter-Rationalizations

Output Format

Things Mac

Trello

Production Scheduling

Jira Integration

Production Scheduling

Cost Aware Llm Pipeline