Disaster Recovery

Purpose

Provide comprehensive guidance for designing disaster recovery (DR) strategies, implementing backup systems, and validating recovery procedures across databases, Kubernetes clusters, and cloud infrastructure. Enable teams to define RTO/RPO objectives, select appropriate backup tools, configure automated failover, and test DR capabilities through chaos engineering.

When to Use This Skill

Invoke this skill when:

Defining recovery time objectives (RTO) and recovery point objectives (RPO)
Implementing database backups with point-in-time recovery (PITR)
Setting up Kubernetes cluster backup and restore workflows
Configuring cross-region replication for high availability
Testing disaster recovery procedures through chaos experiments
Meeting compliance requirements (GDPR, SOC 2, HIPAA)
Automating backup monitoring and alerting
Designing multi-cloud disaster recovery architectures

Core Concepts

Disaster Recovery

Purpose

When to Use This Skill

Invoke this skill when:

Defining recovery time objectives (RTO) and recovery point objectives (RPO)
Implementing database backups with point-in-time recovery (PITR)
Setting up Kubernetes cluster backup and restore workflows
Configuring cross-region replication for high availability
Testing disaster recovery procedures through chaos experiments
Meeting compliance requirements (GDPR, SOC 2, HIPAA)
Automating backup monitoring and alerting
Designing multi-cloud disaster recovery architectures

Use Case	Primary Tool	Alternative	Key Feature
PostgreSQL production	pgBackRest	WAL-G	PITR, compression, multi-repo
MySQL production	Percona XtraBackup	WAL-G	Hot backups, incremental
MongoDB	Atlas Backup	mongodump	Continuous backup, PITR
Kubernetes cluster	Velero	ArgoCD + Git	PV snapshots, scheduling
File/object backup	Restic	Duplicity	Encryption, deduplication
Cross-region replication	Aurora Global DB	RDS Read Replica	Active-Active capable

Pattern	RTO	RPO	Cost	Use Case
Active-Active	< 1 min	< 1 min	High	Both regions serve traffic
Active-Passive	15-60 min	5-15 min	Medium	Standby for failover
Pilot Light	10-30 min	5-15 min	Low	Minimal secondary infra
Warm Standby	5-15 min	5-15 min	Med-High	Scaled-down secondary

Regulation	Retention	Requirements
GDPR	1-7 years	EU data residency, right to erasure
SOC 2	1 year+	Secure deletion, access controls
HIPAA	6 years	Encryption, PHI protection
PCI DSS	3mo-1yr	Secure deletion, quarterly reviews

Planning Disaster Recovery

Disaster Recovery

Purpose

When to Use This Skill

Core Concepts

Planning Disaster Recovery

Disaster Recovery

Purpose

When to Use This Skill

Core Concepts

RTO and RPO Fundamentals

3-2-1 Backup Rule

Backup Types

Quick Decision Framework

Step 1: Map RTO/RPO to Strategy

Step 2: Select Backup Tools by Use Case

Database Backup Patterns

PostgreSQL with pgBackRest

MySQL with Percona XtraBackup

MongoDB Backup

Kubernetes Disaster Recovery

Velero for Cluster Backups

etcd Backup

Cloud-Specific DR Patterns

AWS

GCP

Azure

Cross-Region Replication Patterns

Testing Disaster Recovery

Chaos Engineering

Automated DR Drills

Compliance and Retention

Monitoring and Alerting

Automation and Runbooks

Integration with Other Skills

Related Skills

Skill Chaining Example

Best Practices

Do

Don't

Reference Documentation

Examples

Scripts

Feishu Drive

Nanoclaw Repl

Crosspost

Cloudflare

Mcp Integration

Setup Deploy