Business continuity and disaster recovery: 30-day retention, quarterly restore tests, RTO/RPO targets per ISO 27001 A.17
This skill provides systematic guidance for implementing business continuity and disaster recovery within the CIA platform, ensuring data protection aligns with business impact analysis, RTO/RPO targets, and ISO 27001 Annex A.17 requirements.
Apply this skill when:
Do NOT skip for:
| Classification | Business Impact | RPO Target | RTO Target | Backup Frequency | Retention |
|---|---|---|---|---|---|
| RESTRICTED | Extreme | <15 minutes | <1 hour | Continuous replication | 30 days minimum |
| CONFIDENTIAL | Very High | <4 hours | <4 hours | Hourly | 7 years (financial data) |
| INTERNAL | Moderate | <24 hours | <24 hours | Daily | 3 years |
| PUBLIC | Low | >24 hours | >72 hours | Weekly | Indefinite |
graph TD
START[💾 Backup Need] --> CLASSIFY{🏷️ Data Classification?}
CLASSIFY -->|RESTRICTED| CRITICAL[🔴 Critical Backup<br/>Continuous Replication]
CLASSIFY -->|CONFIDENTIAL| HIGH[🟠 High Priority<br/>Hourly Backups]
CLASSIFY -->|INTERNAL| MEDIUM[🟡 Medium Priority<br/>Daily Backups]
CLASSIFY -->|PUBLIC| STANDARD[🟢 Standard<br/>Weekly Backups]
CRITICAL --> IMPACT{📊 Business Impact?}
HIGH --> IMPACT
MEDIUM --> IMPACT
STANDARD --> IMPACT
IMPACT -->|Financial System| FINANCE[💰 Financial Data<br/>7-year retention]
IMPACT -->|Core Operations| CORE[🏗️ Core Systems<br/>1-year retention]
IMPACT -->|Support Functions| SUPPORT[🛠️ Support Systems<br/>3-month retention]
FINANCE --> METHOD{🔧 Backup Method?}
CORE --> METHOD
SUPPORT --> METHOD
METHOD -->|Database| RDS[📊 AWS RDS<br/>Automated Snapshots]
METHOD -->|Files| S3[📁 AWS S3<br/>Versioning + Lifecycle]
METHOD -->|Infrastructure| IaC[🏗️ Infrastructure as Code<br/>Git + CloudFormation]
RDS --> TEST[🧪 Quarterly Restore Test]
S3 --> TEST
IaC --> TEST
TEST --> MONITOR[📈 Monitoring & Alerts]
style CRITICAL fill:#D32F2F
style HIGH fill:#FF9800
style MEDIUM fill:#FDD835
style STANDARD fill:#4CAF50
Definition: Complete copy of all data at a point in time.
Use Cases:
AWS Implementation:
# CloudFormation template for full database backup
Resources:
DatabaseFullBackupFunction:
Type: AWS::Lambda::Function
Properties:
FunctionName: cia-database-full-backup
Runtime: python3.12
Handler: index.lambda_handler
Role: !GetAtt BackupFunctionRole.Arn
Timeout: 900 # 15 minutes
Environment:
Variables:
RDS_INSTANCE_ID: !Ref CIADatabase
BACKUP_BUCKET: !Ref BackupBucket
Code:
ZipFile: |
import boto3
import datetime
import os
rds = boto3.client('rds')
def lambda_handler(event, context):
"""
Create full RDS snapshot
"""
instance_id = os.environ['RDS_INSTANCE_ID']
timestamp = datetime.datetime.now().strftime('%Y%m%d-%H%M%S')
snapshot_id = f"{instance_id}-full-{timestamp}"
# Create snapshot
response = rds.create_db_snapshot(
DBSnapshotIdentifier=snapshot_id,
DBInstanceIdentifier=instance_id,
Tags=[
{'Key': 'BackupType', 'Value': 'Full'},
{'Key': 'CreatedBy', 'Value': 'Automated'},
{'Key': 'Retention', 'Value': '30days'}
]
)
print(f"Full backup initiated: {snapshot_id}")
return {
'statusCode': 200,
'body': snapshot_id
}
# Schedule monthly full backups
FullBackupSchedule:
Type: AWS::Events::Rule
Properties:
Name: cia-monthly-full-backup
Description: Monthly full database backup
ScheduleExpression: cron(0 2 1 * ? *) # 1st of month at 2 AM UTC
State: ENABLED
Targets:
- Arn: !GetAtt DatabaseFullBackupFunction.Arn
Id: FullBackupTarget
Definition: Maximum acceptable time to restore service after an outage.
RTO Tiers:
| RTO Level | Time Window | Business Function Example | Implementation |
|---|---|---|---|
| Instant | <5 minutes | Financial transactions | Multi-AZ failover |
| Critical | 5-60 minutes | Core database | Automated failover |
| High | 1-4 hours | Application services | Blue-green deployment |
| Medium | 4-24 hours | Analytics systems | Manual restore from backup |
| Standard | >24 hours | Historical archives | Restore on demand |
RTO Configuration Example:
# Multi-AZ RDS for instant failover (RTO <5 minutes)
CIADatabase:
Type: AWS::RDS::DBInstance
Properties:
DBInstanceIdentifier: cia-production-db
Engine: postgres
EngineVersion: "18.3"
DBInstanceClass: db.t3.medium
AllocatedStorage: 100
StorageType: gp3
StorageEncrypted: true
KmsKeyId: !Ref DatabaseEncryptionKey
# Multi-AZ for high availability (automatic failover)
MultiAZ: true
# Automated backups for point-in-time recovery
BackupRetentionPeriod: 35
PreferredBackupWindow: "03:00-04:00"
# Deletion protection
DeletionProtection: true
Tags:
- Key: RTO
Value: Critical-5to60min
- Key: RPO
Value: NearRealtime-1to15min
- Key: BusinessImpact
Value: VeryHigh
Definition: Maximum acceptable data loss measured in time.
RPO Tiers:
| RPO Level | Data Loss Window | Business Function Example | Backup Frequency |
|---|---|---|---|
| Zero Loss | <1 minute | Financial records | Synchronous replication |
| Near Real-time | 1-15 minutes | Core database | Continuous backup |
| Minimal | 15-60 minutes | Application data | 15-minute snapshots |
| Hourly | 1-4 hours | User activity logs | Hourly backups |
| Daily | 4-24 hours | Analytics data | Daily backups |
| Extended | >24 hours | Archived data | Weekly backups |
Objective: Verify backup integrity and validate RTO/RPO targets.
Frequency: Quarterly minimum (ISO 27001 A.17.1.3)
Test Checklist:
Automated Restore Test Script:
#!/bin/bash
# Quarterly backup restore test
# Tests RTO/RPO compliance and backup integrity
set -euo pipefail
TEST_DATE=$(date +%Y%m%d-%H%M%S)
TEST_REPORT="backup-restore-test-${TEST_DATE}.md"
TEST_INSTANCE="cia-restore-test-${TEST_DATE}"
log() {
echo "[$(date -u +"%Y-%m-%d %H:%M:%S UTC")] $*" | tee -a "${TEST_REPORT}"
}
# Start restore test
log "# Quarterly Backup Restore Test"
log ""
log "**Test Date**: $(date -u +"%Y-%m-%d %H:%M:%S UTC")"
log "**Tester**: CEO"
log "**Test Instance**: ${TEST_INSTANCE}"
log ""
# Step 1: Identify latest backup
log "## Step 1: Identify Latest Backup"
SNAPSHOT_ID=$(aws rds describe-db-snapshots \
--db-instance-identifier cia-production-db \
--query 'DBSnapshots | sort_by(@, &SnapshotCreateTime) | [-1].DBSnapshotIdentifier' \
--output text)
log "- Latest snapshot: ${SNAPSHOT_ID}"
SNAPSHOT_TIME=$(aws rds describe-db-snapshots \
--db-snapshot-identifier "${SNAPSHOT_ID}" \
--query 'DBSnapshots[0].SnapshotCreateTime' \
--output text)
log "- Snapshot time: ${SNAPSHOT_TIME}"
log ""
# Step 2: Restore snapshot to test instance
log "## Step 2: Restore Database"
START_TIME=$(date +%s)
log "- Initiating restore..."
aws rds restore-db-instance-from-db-snapshot \
--db-instance-identifier "${TEST_INSTANCE}" \
--db-snapshot-identifier "${SNAPSHOT_ID}" \
--db-instance-class db.t3.small \
--publicly-accessible false \
--no-multi-az \
--tags Key=Purpose,Value=RestoreTest Key=TestDate,Value="${TEST_DATE}"
# Wait for instance to be available
log "- Waiting for instance to become available..."
aws rds wait db-instance-available --db-instance-identifier "${TEST_INSTANCE}"
END_TIME=$(date +%s)
RESTORE_DURATION=$((END_TIME - START_TIME))
log "- ✅ Restore completed in ${RESTORE_DURATION} seconds"
log ""
# Step 3: Verify data integrity
log "## Step 3: Data Integrity Verification"
# Get endpoint
DB_ENDPOINT=$(aws rds describe-db-instances \
--db-instance-identifier "${TEST_INSTANCE}" \
--query 'DBInstances[0].Endpoint.Address' \
--output text)
log "- Database endpoint: ${DB_ENDPOINT}"
# Verify row counts
log "- Verifying table row counts..."
psql -h "${DB_ENDPOINT}" -U cia_user -d cia_database -c "\
SELECT schemaname, tablename, n_live_tup as row_count \
FROM pg_stat_user_tables \
ORDER BY n_live_tup DESC \
LIMIT 10;" | tee -a "${TEST_REPORT}"
# Step 4: Cleanup
log "## Step 4: Cleanup"
log "- Deleting test instance..."
aws rds delete-db-instance \
--db-instance-identifier "${TEST_INSTANCE}" \
--skip-final-snapshot \
--delete-automated-backups
log "- ✅ Test instance cleanup initiated"
log ""
# Summary
log "## Test Summary"
log ""
log "| Metric | Target | Actual | Status |"
log "|--------|--------|--------|--------|"
log "| RTO | <4 hours | $(($RESTORE_DURATION / 60)) minutes | ✅ Pass |"
log "| Data Integrity | 100% | Verified | ✅ Pass |"
log ""
echo "✅ Restore test completed. Report: ${TEST_REPORT}"
| Classification | Retention Period | Rationale | Disposal Method |
|---|---|---|---|
| RESTRICTED | Minimum required | Compliance, immediate disposal after expiry | Secure deletion (multi-pass overwrite) |
| CONFIDENTIAL | 7 years | Financial/legal requirements (Swedish law) | Secure deletion with audit trail |
| INTERNAL | 3 years | Operational history | Standard deletion |
| PUBLIC | Indefinite | Historical value, public interest | Standard deletion (if needed) |
BackupBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: cia-backups
BucketEncryption:
ServerSideEncryptionConfiguration:
- ServerSideEncryptionByDefault:
SSEAlgorithm: aws:kms
KMSMasterKeyID: !Ref BackupEncryptionKey
VersioningConfiguration:
Status: Enabled
LifecycleConfiguration:
Rules:
# CONFIDENTIAL financial data: 7-year retention
- Id: ConfidentialFinancialRetention
Status: Enabled
Prefix: confidential/financial/
ExpirationInDays: 2555 # 7 years
NoncurrentVersionExpirationInDays: 90
Transitions:
- TransitionInDays: 90
StorageClass: STANDARD_IA
- TransitionInDays: 365
StorageClass: GLACIER
# INTERNAL data: 3-year retention
- Id: InternalDataRetention
Status: Enabled
Prefix: internal/
ExpirationInDays: 1095 # 3 years
NoncurrentVersionExpirationInDays: 30
Transitions:
- TransitionInDays: 30
StorageClass: STANDARD_IA
- TransitionInDays: 180
StorageClass: GLACIER
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
Tags:
- Key: Purpose
Value: BackupStorage
- Key: DataClassification
Value: Mixed
Trigger: Application detects data integrity issues, corrupted records.
Recovery Procedure:
Immediate Actions (0-15 minutes)
Point-in-Time Recovery (15-60 minutes)
Application Cutover (60-90 minutes)
Post-Recovery (90+ minutes)
Expected RTO: 2 hours
Expected RPO: <15 minutes (point-in-time recovery)
Trigger: AWS region unavailable, all services unreachable.
Recovery Procedure:
Immediate Actions (0-30 minutes)
Database Recovery (30-120 minutes)
Application Deployment (120-180 minutes)
Validation (180-240 minutes)
Expected RTO: 4 hours
Expected RPO: <4 hours (cross-region snapshot lag)
Control Objective: Organization shall establish, document, implement and maintain processes, procedures and controls to ensure the required level of continuity for information security during an adverse situation.
Implementation:
Control Objective: Organization shall verify established and implemented information security continuity controls at regular intervals.
Implementation:
PR.IP-4: Backups of information conducted, maintained, tested
RC.RP-1: Recovery plan executed during or after incident
CIS Control 11: Data Recovery