Optimize cloud infrastructure costs through FinOps practices, commitment discounts, right-sizing, and automated cost management. Use when reducing cloud spend, implementing budget controls, or establishing cost visibility across AWS, Azure, GCP, and Kubernetes environments.
Cloud cost optimization transforms uncontrolled spending into strategic resource allocation through the FinOps lifecycle: Inform, Optimize, and Operate. This skill provides decision frameworks for commitment-based discounts (Reserved Instances, Savings Plans), right-sizing strategies, Kubernetes cost management, and automated cost governance across multi-cloud environments.
Invoke cost-optimization when:
┌─────────────────────────────────────────────────────┐
│ INFORM → OPTIMIZE → OPERATE (continuous loop) │
│ ↓ ↓ ↓ │
│ Visibility Action Automation │
└─────────────────────────────────────────────────────┘
Inform Phase: Establish cost visibility
Optimize Phase: Take action on cost drivers
Operate Phase: Automate and govern
For detailed FinOps maturity models and organizational structures, see references/finops-foundations.md.
Reserved Instances (RIs): 40-72% discount for 1-3 year commitments
Savings Plans: Flexible compute commitments
GCP Committed Use Discounts (CUDs): 25-70% discount
Decision Framework:
Reserve when:
├─ Workload is production-critical (24/7 uptime required)
├─ Usage is predictable (stable baseline over 6+ months)
├─ Architecture is stable (unlikely to change instance types)
└─ Financial commitment acceptable (1-3 year lock-in)
Use On-Demand when:
├─ Development/testing environments
├─ Unpredictable spiky workloads
├─ Short-term projects (<6 months)
└─ Evaluating new instance types
For detailed commitment strategies and RI coverage analysis, see references/commitment-strategies.md.
Discount: 70-90% off on-demand pricing (interruptible with 2-minute warning)
Use Spot For: CI/CD workers, batch jobs, ML training (with checkpointing), Kubernetes workers, data analytics Avoid Spot For: Stateful databases, real-time services, long-running jobs without checkpointing
Best Practices:
Target Utilization: 60-80% average (leave headroom for spikes)
Compute Right-Sizing:
Database Right-Sizing:
Kubernetes Right-Sizing:
Storage Right-Sizing:
Right-Sizing Tools:
Resource Requests and Limits:
# Set requests = average usage (enables efficient bin-packing)