42:T643f,# FinOps Patterns

Best practices for managing cloud financial operations -- visibility, optimization, and governance of cloud spend at scale.

Cloud Cost Anatomy

Understanding where cloud money goes is the first step to controlling it.

Total Cloud Spend
  |
  +-- Compute (60-70%)
  |     +-- VMs / Instances
  |     +-- Containers (EKS, GKE, ECS)
  |     +-- Serverless (Lambda, Cloud Functions)
  |     +-- GPU / ML Training
  |
  +-- Storage (15-20%)
  |     +-- Block (EBS, Persistent Disks)
  |     +-- Object (S3, GCS, Blob)
  |     +-- Database (RDS, Cloud SQL, DynamoDB)
  |
  +-- Network (10-15%)
  |     +-- Data Transfer (egress is expensive)
  |     +-- Load Balancers
  |     +-- CDN
  |     +-- VPN / Direct Connect
  |
  +-- Other (5-10%)
        +-- Monitoring / Logging
        +-- DNS / Certificates
        +-- Support Plans
        +-- Marketplace

Cost Driver	Typical Waste	Quick Win
Idle instances	20-30% of compute	Schedule dev/test shutdown nights + weekends

42:T643f,# FinOps Patterns

Best practices for managing cloud financial operations -- visibility, optimization, and governance of cloud spend at scale.

Cloud Cost Anatomy

Understanding where cloud money goes is the first step to controlling it.

Total Cloud Spend
  |
  +-- Compute (60-70%)
  |     +-- VMs / Instances
  |     +-- Containers (EKS, GKE, ECS)
  |     +-- Serverless (Lambda, Cloud Functions)
  |     +-- GPU / ML Training
  |
  +-- Storage (15-20%)
  |     +-- Block (EBS, Persistent Disks)
  |     +-- Object (S3, GCS, Blob)
  |     +-- Database (RDS, Cloud SQL, DynamoDB)
  |
  +-- Network (10-15%)
  |     +-- Data Transfer (egress is expensive)
  |     +-- Load Balancers
  |     +-- CDN
  |     +-- VPN / Direct Connect
  |
  +-- Other (5-10%)
        +-- Monitoring / Logging
        +-- DNS / Certificates
        +-- Support Plans
        +-- Marketplace

Cost Driver	Typical Waste	Quick Win
Idle instances	20-30% of compute	Schedule dev/test shutdown nights + weekends

#!/usr/bin/env python3 """rightsizing.py -- Analyze EC2 instances for rightsizing opportunities.""" import boto3 from datetime import datetime, timedelta, timezone from dataclasses import dataclass UTILIZATION_THRESHOLD = 40 # percent -- instances below this are candidates LOOKBACK_DAYS = 14 MIN_DATAPOINTS = 100 # Require sufficient data before recommending @dataclass class RightsizeRecommendation: instance_id: str instance_type: str avg_cpu: float max_cpu: float avg_memory: float # Requires CloudWatch agent recommended_type: str monthly_savings: float confidence: str # "high" | "medium" | "low" # Instance family downsizing map (simplified) DOWNSIZE_MAP = { "m5.2xlarge": "m5.xlarge", "m5.xlarge": "m5.large", "m5.large": "m5.medium", # Careful: medium may be too small "c5.2xlarge": "c5.xlarge", "c5.xlarge": "c5.large", "r5.2xlarge": "r5.xlarge", "r5.xlarge": "r5.large", "t3.xlarge": "t3.large", "t3.large": "t3.medium", } # Approximate on-demand hourly pricing (us-east-1) HOURLY_PRICING = { "m5.2xlarge": 0.384, "m5.xlarge": 0.192, "m5.large": 0.096, "c5.2xlarge": 0.340, "c5.xlarge": 0.170, "c5.large": 0.085, "r5.2xlarge": 0.504, "r5.xlarge": 0.252, "r5.large": 0.126, "t3.xlarge": 0.1664, "t3.large": 0.0832, "t3.medium": 0.0416, } def get_cpu_utilization(cw_client, instance_id: str) -> tuple[float, float]: """Return (avg_cpu, max_cpu) over the lookback period.""" end = datetime.now(timezone.utc) start = end - timedelta(days=LOOKBACK_DAYS) response = cw_client.get_metric_statistics( Namespace="AWS/EC2", MetricName="CPUUtilization", Dimensions=[{"Name": "InstanceId", "Value": instance_id}], StartTime=start, EndTime=end, Period=3600, # 1-hour intervals Statistics=["Average", "Maximum"], ) datapoints = response.get("Datapoints", []) if len(datapoints) < MIN_DATAPOINTS: return -1.0, -1.0 # Insufficient data avg = sum(dp["Average"] for dp in datapoints) / len(datapoints) peak = max(dp["Maximum"] for dp in datapoints) return avg, peak def analyze_fleet() -> list[RightsizeRecommendation]: """Analyze all running EC2 instances for rightsizing.""" ec2 = boto3.client("ec2") cw = boto3.client("cloudwatch") recommendations = [] # Get all running instances paginator = ec2.get_paginator("describe_instances") for page in paginator.paginate( Filters=[{"Name": "instance-state-name", "Values": ["running"]}] ): for reservation in page["Reservations"]: for instance in reservation["Instances"]: instance_id = instance["InstanceId"] instance_type = instance["InstanceType"] # Skip instances not in our downsize map if instance_type not in DOWNSIZE_MAP: continue avg_cpu, max_cpu = get_cpu_utilization(cw, instance_id) if avg_cpu < 0: continue # Insufficient data if avg_cpu < UTILIZATION_THRESHOLD: recommended = DOWNSIZE_MAP[instance_type] current_cost = HOURLY_PRICING.get(instance_type, 0) new_cost = HOURLY_PRICING.get(recommended, 0) monthly_savings = (current_cost - new_cost) * 730 confidence = "high" if max_cpu < 60 else "medium" recommendations.append(RightsizeRecommendation( instance_id=instance_id, instance_type=instance_type, avg_cpu=round(avg_cpu, 1), max_cpu=round(max_cpu, 1), avg_memory=-1.0, recommended_type=recommended, monthly_savings=round(monthly_savings, 2), confidence=confidence, )) # Sort by savings potential (highest first) recommendations.sort(key=lambda r: r.monthly_savings, reverse=True) return recommendations if __name__ == "__main__": recs = analyze_fleet() total_savings = sum(r.monthly_savings for r in recs) print(f"\n{'='*80}") print(f"RIGHTSIZING RECOMMENDATIONS ({len(recs)} instances)") print(f"Potential monthly savings: ${total_savings:,.2f}") print(f"{'='*80}\n") for r in recs: print(f" {r.instance_id}: {r.instance_type} -> {r.recommended_type}") print(f" CPU avg={r.avg_cpu}% max={r.max_cpu}%") print(f" Savings: ${r.monthly_savings}/mo Confidence: {r.confidence}") print()

# cloudformation/budget-alerts.yaml AWSTemplateFormatVersion: "2010-09-09" Description: FinOps budget alerts with multi-threshold notifications Parameters: TeamName: Type: String Description: Team name for cost allocation MonthlyBudget: Type: Number Description: Monthly budget in USD AlertEmail: Type: String Description: Email for budget notifications SlackWebhookArn: Type: String Description: ARN of Lambda that posts to Slack Resources: # SNS topic for budget alerts BudgetAlertTopic: Type: AWS::SNS::Topic Properties: TopicName: !Sub "${TeamName}-budget-alerts" Subscription: - Protocol: email Endpoint: !Ref AlertEmail - Protocol: lambda Endpoint: !Ref SlackWebhookArn # Monthly budget with progressive thresholds TeamBudget: Type: AWS::Budgets::Budget Properties: Budget: BudgetName: !Sub "${TeamName}-monthly-budget" BudgetLimit: Amount: !Ref MonthlyBudget Unit: USD TimeUnit: MONTHLY BudgetType: COST CostFilters: TagKeyValue: - !Sub "user:team$${TeamName}" NotificationsWithSubscribers: # 50% threshold -- informational - Notification: NotificationType: ACTUAL ComparisonOperator: GREATER_THAN Threshold: 50 ThresholdType: PERCENTAGE Subscribers: - SubscriptionType: SNS Address: !Ref BudgetAlertTopic # 80% threshold -- warning - Notification: NotificationType: ACTUAL ComparisonOperator: GREATER_THAN Threshold: 80 ThresholdType: PERCENTAGE Subscribers: - SubscriptionType: SNS Address: !Ref BudgetAlertTopic # 100% threshold -- critical - Notification: NotificationType: ACTUAL ComparisonOperator: GREATER_THAN Threshold: 100 ThresholdType: PERCENTAGE Subscribers: - SubscriptionType: SNS Address: !Ref BudgetAlertTopic # Forecasted to exceed -- early warning - Notification: NotificationType: FORECASTED ComparisonOperator: GREATER_THAN Threshold: 100 ThresholdType: PERCENTAGE Subscribers: - SubscriptionType: SNS Address: !Ref BudgetAlertTopic # CloudWatch alarm for sudden spend spikes (daily granularity) DailySpendAlarm: Type: AWS::CloudWatch::Alarm Properties: AlarmName: !Sub "${TeamName}-daily-spend-spike" AlarmDescription: "Daily spend exceeds expected daily average by 2x" Namespace: AWS/Billing MetricName: EstimatedCharges Dimensions: - Name: Currency Value: USD Statistic: Maximum Period: 86400 # 24 hours EvaluationPeriods: 1 # Threshold = (monthly budget / 30 days) * 2 (spike factor) Threshold: !Sub "${AWS::NoValue}" ComparisonOperator: GreaterThanThreshold AlarmActions: - !Ref BudgetAlertTopic

Dimension	Crawl	Walk	Run
Visibility	Monthly bill review	Tagged cost allocation by team	Real-time cost dashboards per service
Allocation	Single account, no tagging	Cost centers with basic tags	Full showback/chargeback by product line
Optimization	Ad-hoc rightsizing	Quarterly review with recommendations	Automated rightsizing and scaling policies
Forecasting	None	Spreadsheet-based projections	ML-based anomaly detection and forecasts
Governance	No budgets	Account-level budgets	Per-team budgets with automated enforcement
Commitment	On-demand only	Some Reserved Instances	Savings Plans + Spot + RI portfolio managed
Culture	Central IT pays the bill	Engineering aware of costs	Engineers own cost as a feature metric
Tooling	AWS Console only	Cost Explorer + basic reports	FinOps platform (Kubecost, CloudHealth, etc.)

Finops Patterns

Cloud Cost Anatomy

Finops Patterns

Cloud Cost Anatomy

FinOps Maturity Model

Cost Allocation Tagging Strategy

Required Tag Schema

Tag Enforcement via AWS SCP

Rightsizing Analysis

Instance Rightsizing Script

Budget Alert Configuration

AWS CloudWatch Budget Alarm

Spot Instance Handling

Spot Instance Strategy with Fallback

Mcporter

Sonoscli

Openhue

Healthcheck

Things Mac

Eightctl