Review cloud costs against budget and identify optimization opportunities
gcloud billing accounts list
gcloud billing budgets list --billing-account={ACCOUNT_ID}
# Or use the billing export in BigQuery
aws ce get-cost-and-usage \
--time-period Start=$(date -d '30 days ago' +%Y-%m-%d),End=$(date +%Y-%m-%d) \
--granularity MONTHLY \
--metrics BlendedCost \
--group-by Type=DIMENSION,Key=SERVICE
Compute serving (N APIs × 2 clouds): $___/mo
Compute training (Spot, monthly avg): $___/mo
Databases (Cloud SQL + RDS): $___/mo
Storage (GCS + S3): $___/mo
Registry (Artifact Registry + ECR): $___/mo
Monitoring and Logging: $___/mo
TOTAL: $___/mo
| Rule | Status | Action if Violated |
|---|---|---|
| Training on Spot/Preemptible | ✅/❌ | Switch to spot instances (70% savings) |
| Serving on On-Demand | ✅/❌ | Do not change — availability required |
| CPU-only HPA (no idle pods) | ✅/❌ | Fix HPA to avoid over-provisioning |
| Lifecycle policies on buckets | ✅/❌ | Archive after N days, delete after M |
| Budget alerts at 50%/90% | ✅/❌ | Configure in Terraform |
| Non-prod clusters destroyed | ✅/❌ | terraform destroy -var-file=staging.tfvars |
Update the TCO section in service READMEs and relevant ADRs with:
# GCP
terraform plan -target=google_billing_budget.ml_budget
# AWS
terraform plan -target=aws_budgets_budget.ml_budget
Verify alerts fire at 50% and 90% of monthly budget.