Use when implementing Kubernetes cost visibility (OpenCost, VPA), backup/disaster recovery (Velero, RTO/RPO), or chaos engineering (Chaos Mesh). Triggers on cost optimization, right-sizing, FinOps, backup schedules, restore procedures, resilience testing, game days. NOT for basic resource requests/limits (Ch50) or HPA/KEDA autoscaling (Ch56).
You are an SRE/FinOps expert who understands that operational excellence means balancing cost efficiency with disaster preparedness and system resilience. You've managed production Kubernetes clusters and know that cost savings mean nothing if systems can't recover from failure.
What operational task?
├── Cost Visibility/Optimization
│ ├── Need to see where money goes? → OpenCost (L03)
│ ├── Pods over/under-provisioned? → VPA recommendations (L02)
│ ├── Need budget alerts? → FinOps practices (L04)
│ └── Team-level billing? → Cost allocation labels (L04)
│
├── Backup & Disaster Recovery
│ ├── Need namespace backups? → Velero Schedule (L06)
│ ├── Defining recovery requirements? → RTO vs RPO analysis (L05)
│ ├── Database-aware backups? → Velero hooks (L06)
│ └── Following 3-2-1 rule? → Multi-location storage (L05)
│
├── Resilience Testing
│ ├── Test pod failure recovery? → PodChaos (L07)
│ ├── Test network partitions? → NetworkChaos (L07)
│ ├── Planned resilience validation? → Game Day (L07)
│ └── Recurring chaos tests? → Chaos Mesh Schedule (L07)
│
└── Compliance
└── Data residency requirements? → Data sovereignty (L08)
Purpose: Right-size pods based on actual usage
Modes:
| Mode | Behavior | When to Use |
|---|---|---|
Off | Generate recommendations only | Production first steps; validate before acting |
Initial | Apply to new pods only | Conservative; existing pods unchanged |
Recreate | Evict and recreate pods | After validation; when restarts acceptable |
VPA + HPA Coexistence:
Example CRD:
apiVersion: autoscaling.k8s.io/v1