Backup strategies and disaster recovery procedures for databases. Use when: database backup, backup strategy, disaster recovery, restore procedure, backup validation, point in time recovery, PITR, pg_dump, pg_basebackup, mysqldump, xtrabackup, WAL archiving, incremental backup, full backup, backup retention, backup testing, backup encryption, RTO RPO, recovery time objective, recovery point objective, offsite backup, backup monitoring, restore drill, backup automation, S3 backup, backup compression, backup integrity, DR plan.
Invoke this skill when you need to:
Before choosing any backup technology, define objectives with stakeholders:
| Objective | Definition | Example Target |
|---|---|---|
| RPO (Recovery Point Objective) | Maximum acceptable data loss window | 1 hour — lose at most 1h of data |
| RTO (Recovery Time Objective) | Maximum acceptable recovery time | 30 minutes — system back online within 30m |
RPO → Backup frequency mapping:
| RPO | Required Backup Strategy |
|---|---|
| < 5 minutes | Continuous WAL archiving / binlog streaming |
| < 1 hour | WAL archiving with hourly base backup rotation |
| < 24 hours | Daily full backup |
| < 7 days | Weekly full + daily incremental |
Checklist:
| Type | Description | Tool Examples |
|---|---|---|
| Logical backup | SQL dump of objects and data | pg_dump, mysqldump, mongodump |
| Physical / file-level backup | Binary copy of data directory | pg_basebackup, xtrabackup, file snapshot |
| Snapshot backup | Cloud volume snapshot | AWS EBS Snapshot, Azure Managed Disk Snapshot |
| WAL archiving / binlog streaming | Continuous log shipping for PITR | archive_command, pgBackRest, Barman |
| Managed backup | Cloud-native automated backup | AWS RDS automated backup, GCP Cloud SQL |
Decision guide:
| Use Case | Recommended |
|---|---|
| Small DB (< 10GB), simple restore, portability | Logical (pg_dump) |
| Large DB, fast restore, PITR required | Physical + WAL archiving |
| Cloud-hosted DB with managed service | Cloud snapshot + WAL archiving |
| Continuous RPO < 15 min | WAL archiving (streaming) |
Checklist:
Logical backup (pg_dump):
pg_dump \
--host=localhost \
--username=postgres \
--format=custom \ # compressed, parallel-restore capable
--file=/backups/mydb_$(date +%Y%m%d_%H%M%S).dump \
mydb
Physical backup (pg_basebackup):
pg_basebackup \
--host=localhost \
--username=replication_user \
--pgdata=/backups/basebackup_$(date +%Y%m%d) \
--format=tar \
--gzip \
--checkpoint=fast \
--wal-method=stream # include WAL needed to make backup consistent
WAL archiving for PITR (postgresql.conf):
archive_mode = on
archive_command = 'pgbackrest --stanza=mydb archive-push %p'
# or: aws s3 cp %p s3://my-wal-archive/%f
wal_level = replica
pgBackRest (recommended for production):
pgbackrest --stanza=mydb --type=full backup # Full backup
pgbackrest --stanza=mydb --type=incr backup # Incremental
pgbackrest --stanza=mydb --type=diff backup # Differential
Checklist:
--format=custom used for pg_dump — enables parallel restore with pg_restore -jarchive_command failure tested — PostgreSQL stops if archiving fails repeatedly (protects WAL)Logical backup:
mysqldump \
--single-transaction \ # consistent snapshot without locking (InnoDB)
--master-data=2 \ # record binlog position in dump header
--databases mydb \
| gzip > /backups/mydb_$(date +%Y%m%d).sql.gz
Physical backup (xtrabackup — preferred for large DBs):
xtrabackup --backup --target-dir=/backups/full_$(date +%Y%m%d) --user=root --password=...
xtrabackup --prepare --target-dir=/backups/full_$(date +%Y%m%d)
Binlog archiving:
# my.cnf
log_bin = /var/lib/mysql/binlog
expire_logs_days = 7 # or: binlog_expire_logs_seconds = 604800
binlog_format = ROW # required for PITR
Checklist:
--single-transaction used for mysqldump — avoids table locks on InnoDBxtrabackup used for databases > 10GB — faster and non-blockingbinlog_format = ROW confirmed — required for reliable PITRStorage:
Encryption:
# Encrypt before upload:
openssl enc -aes-256-cbc -pbkdf2 -in dump.sql -out dump.sql.enc -k "$BACKUP_PASSPHRASE"
# Or: pgBackRest built-in encryption with cipher-type=aes-256-cbc
Retention schedule:
| Period | Frequency | Retention |
|---|---|---|
| Daily | Every 24h | 7 days |
| Weekly | Every 7 days | 4 weeks |
| Monthly | Every 30 days | 12 months |
| Annual | January 1 | 7 years (compliance) |
Checklist:
PostgreSQL PITR restore:
# 1. Stop the database
pg_ctl stop -D /var/lib/postgresql/data
# 2. Restore base backup
pgbackrest --stanza=mydb --delta restore
# 3. Configure recovery target (postgresql.conf / recovery.conf)
echo "restore_command = 'pgbackrest --stanza=mydb archive-get %f %p'" >> recovery.conf
echo "recovery_target_time = '2026-03-29 14:30:00 UTC'" >> recovery.conf
# 4. Start the database — recovery applies WAL until target time
pg_ctl start -D /var/lib/postgresql/data
MySQL PITR restore:
# 1. Restore full backup
xtrabackup --copy-back --target-dir=/backups/full_20260329
# 2. Replay binlogs from the point recorded in the backup header
mysqlbinlog --start-position=<pos> /var/log/mysql/binlog.000042 ... | mysql
Checklist:
Monitoring:
Restore drill:
Checklist:
mysqldump used without --single-transaction — table locks during backup windowpg_dump used for large databases — restore time exceeds RTO