Finops Ai Expert | Skills Pool
Finops Ai Expert Cost optimization for AI workloads - model selection, GPU sizing, commitment strategies, and multi-cloud cost management
majiayu000 4 stars Feb 5, 2026 Occupation Categories Data Engineering You are an expert in Financial Operations (FinOps) for AI workloads, specializing in cost optimization across model selection, infrastructure sizing, commitment strategies, and multi-cloud cost management.
AI Cost Components
Cost Breakdown Framework
┌─────────────────────────────────────────────────────────────────┐
│ AI WORKLOAD COST STACK │
├─────────────────────────────────────────────────────────────────┤
│ │
│ INFERENCE COSTS (60-80% typical) │
│ ├── Token costs (input + output) │
│ ├── GPU compute time │
│ └── API call overhead │
│ │
│ INFRASTRUCTURE COSTS (15-30%) │
│ ├── GPU/Compute instances │
│ ├── Storage (models, vectors, data) │
│ ├── Networking (egress, load balancers) │
│ └── Supporting services (DBs, queues, caches) │
│ │
│ DEVELOPMENT COSTS (5-15%) │
│ ├── Training/Fine-tuning compute │
│ ├── Experimentation │
│ └── Development environments │
│ │
└─────────────────────────────────────────────────────────────────┘
Quick Install
Finops Ai Expert npx skillvault add majiayu000/majiayu000-claude-skill-registry-data-data-finops-ai-skill-md
stars 4
Updated Feb 5, 2026
Occupation
LLM Pricing Comparison
API Pricing (Per 1M Tokens) Provider Model Input Output Context OpenAI GPT-4o $2.50 $10.00 128K OpenAI GPT-4o-mini $0.15 $0.60 128K OpenAI GPT-4 Turbo $10.00 $30.00 128K Anthropic Claude 3.5 Sonnet $3.00 $15.00 200K Anthropic Claude 3 Haiku $0.25 $1.25 200K Google Gemini 1.5 Pro $1.25 $5.00 1M Google Gemini 1.5 Flash $0.075 $0.30 1M AWS Bedrock Claude 3.5 Sonnet $3.00 $15.00 200K AWS Bedrock Llama 3.1 70B $2.65 $3.50 128K Azure OpenAI GPT-4o $5.00 $15.00 128K OCI GenAI Command R+ (DAC) Included Included -
Cost Per Query Estimation class LLMCostCalculator:
PRICING = {
"gpt-4o": {"input": 2.50, "output": 10.00},
"gpt-4o-mini": {"input": 0.15, "output": 0.60},
"claude-3-5-sonnet": {"input": 3.00, "output": 15.00},
"claude-3-haiku": {"input": 0.25, "output": 1.25},
"llama-3-70b": {"input": 2.65, "output": 3.50},
}
def calculate_query_cost(
self,
model: str,
input_tokens: int,
output_tokens: int
) -> float:
"""Calculate cost for a single query in dollars"""
pricing = self.PRICING[model]
input_cost = (input_tokens / 1_000_000) * pricing["input"]
output_cost = (output_tokens / 1_000_000) * pricing["output"]
return input_cost + output_cost
def calculate_monthly_cost(
self,
model: str,
queries_per_day: int,
avg_input_tokens: int,
avg_output_tokens: int
) -> dict:
"""Estimate monthly costs"""
daily_cost = self.calculate_query_cost(
model,
queries_per_day * avg_input_tokens,
queries_per_day * avg_output_tokens
)
monthly_cost = daily_cost * 30
return {
"model": model,
"daily_queries": queries_per_day,
"daily_cost": f"${daily_cost:.2f}",
"monthly_cost": f"${monthly_cost:.2f}",
"annual_cost": f"${monthly_cost * 12:.2f}"
}
# Example
calc = LLMCostCalculator()
# RAG chatbot: 10K queries/day, 2000 input tokens, 500 output tokens
calc.calculate_monthly_cost("gpt-4o", 10000, 2000, 500)
# {'monthly_cost': '$300.00'} # GPT-4o
calc.calculate_monthly_cost("claude-3-haiku", 10000, 2000, 500)
# {'monthly_cost': '$33.75'} # 89% savings with Haiku
Model Selection for Cost Optimization
Decision Matrix ┌─────────────────────────────────────────────────────────────────┐
│ MODEL SELECTION BY USE CASE │
├─────────────────────────────────────────────────────────────────┤
│ │
│ TASK COMPLEXITY │ RECOMMENDED │ COST/1K QUERIES │
│ ────────────────────┼───────────────────────┼───────────────── │
│ Simple Q&A │ GPT-4o-mini, Haiku │ $0.05 - $0.20 │
│ Classification │ Haiku, Gemini Flash │ $0.02 - $0.10 │
│ Summarization │ GPT-4o-mini, Sonnet │ $0.10 - $0.50 │
│ RAG (retrieval) │ Sonnet, GPT-4o-mini │ $0.20 - $1.00 │
│ Code generation │ Sonnet, GPT-4o │ $0.50 - $2.00 │
│ Complex reasoning │ GPT-4o, Claude Opus │ $1.00 - $5.00 │
│ Agent tasks │ Sonnet, GPT-4o │ $2.00 - $10.00 │
│ │
└─────────────────────────────────────────────────────────────────┘
Model Cascading Pattern class ModelCascade:
"""Route to cheapest model that can handle the task"""
def __init__(self):
self.models = [
{"name": "claude-3-haiku", "cost": 0.25, "capability": 0.7},
{"name": "gpt-4o-mini", "cost": 0.15, "capability": 0.75},
{"name": "claude-3-5-sonnet", "cost": 3.00, "capability": 0.95},
{"name": "gpt-4o", "cost": 2.50, "capability": 0.98},
]
async def route(self, query: str, complexity_score: float) -> str:
"""Route to appropriate model based on complexity"""
for model in sorted(self.models, key=lambda x: x["cost"]):
if model["capability"] >= complexity_score:
return model["name"]
return self.models[-1]["name"] # Fallback to most capable
async def cascade_with_fallback(self, query: str) -> dict:
"""Try cheap model first, escalate if needed"""
# Start with cheapest
response = await self.call_model("claude-3-haiku", query)
# Check confidence
if response.confidence < 0.8:
# Escalate to better model
response = await self.call_model("claude-3-5-sonnet", query)
return response
GPU Cost Optimization
GPU Pricing Comparison Provider GPU vCPU Memory Hourly Monthly AWS A10G 4 24GB $1.21 $870 AWS A100 40GB 12 192GB $3.67 $2,640 AWS H100 192 2TB $12.36 $8,900 Azure A10 6 112GB $1.14 $820 Azure A100 80GB 24 220GB $3.40 $2,450 GCP A100 40GB 12 85GB $3.67 $2,640 OCI A10 15 240GB $1.00 $720 Lambda A100 30 200GB $1.29 $930 RunPod A100 - 80GB $1.89 $1,360
Right-Sizing GPU Workloads class GPUSizer:
"""Recommend GPU size based on model and workload"""
GPU_MEMORY = {
"A10G": 24,
"L4": 24,
"A100-40GB": 40,
"A100-80GB": 80,
"H100": 80,
}
MODEL_MEMORY = {
# Model: (FP16 size GB, Quantized GB)
"llama-3.1-8B": (16, 6),
"llama-3.1-70B": (140, 42),
"llama-3.1-405B": (810, 250),
"mistral-7B": (14, 5),
"mixtral-8x7B": (96, 32),
}
def recommend_gpu(
self,
model: str,
batch_size: int = 1,
use_quantization: bool = True
) -> dict:
"""Recommend GPU configuration"""
base_mem, quant_mem = self.MODEL_MEMORY.get(model, (10, 4))
model_mem = quant_mem if use_quantization else base_mem
# Add overhead for KV cache and batch
kv_cache_per_batch = 2 # GB per batch slot
total_mem = model_mem + (kv_cache_per_batch * batch_size) + 2 # 2GB overhead
# Find suitable GPU
suitable_gpus = []
for gpu, mem in self.GPU_MEMORY.items():
if mem >= total_mem:
suitable_gpus.append(gpu)
if not suitable_gpus:
# Need multi-GPU
return {
"recommendation": "multi-gpu",
"min_gpus": (total_mem // 80) + 1,
"gpu_type": "A100-80GB or H100"
}
return {
"recommendation": suitable_gpus[0],
"memory_required": f"{total_mem:.1f}GB",
"batch_size": batch_size,
"quantization": use_quantization
}
Commitment Strategies
Reserved Capacity Comparison Provider Commitment Discount Term Azure PTU Provisioned Throughput ~30% Monthly OCI DAC Dedicated AI Cluster Flat rate Monthly AWS Savings Plans Compute 20-30% 1-3 years GCP CUDs Committed Use 20-57% 1-3 years
Break-Even Analysis def commitment_breakeven(
on_demand_monthly: float,
committed_monthly: float,
commitment_term_months: int,
upfront_cost: float = 0
) -> dict:
"""Calculate break-even point for commitments"""
monthly_savings = on_demand_monthly - committed_monthly
total_commitment_cost = (committed_monthly * commitment_term_months) + upfront_cost
total_on_demand_cost = on_demand_monthly * commitment_term_months
break_even_months = upfront_cost / monthly_savings if monthly_savings > 0 else float('inf')
return {
"monthly_savings": f"${monthly_savings:.2f}",
"total_savings": f"${total_on_demand_cost - total_commitment_cost:.2f}",
"break_even_months": round(break_even_months, 1),
"roi_percentage": f"{((total_on_demand_cost - total_commitment_cost) / total_commitment_cost) * 100:.1f}%"
}
# Example: Azure PTU commitment
commitment_breakeven(
on_demand_monthly=5000, # Pay-as-you-go
committed_monthly=3500, # PTU pricing
commitment_term_months=12,
upfront_cost=0
)
# {'monthly_savings': '$1500.00', 'total_savings': '$18000.00', 'roi_percentage': '42.9%'}
Cost Monitoring & Alerts
Tagging Strategy # Required tags for AI workloads
ai_cost_tags:
mandatory:
- project: "ai-platform"
- environment: "prod/staging/dev"
- cost_center: "engineering"
- workload_type: "inference/training/embedding"
- model: "gpt-4o/claude-3/llama-3"
recommended:
- team: "ml-platform"
- owner: "[email protected] "
- budget_code: "AI-2024-Q1"
Budget Alerts # Terraform for AWS Budget Alert
resource "aws_budgets_budget" "ai_monthly" {
name = "ai-platform-monthly"
budget_type = "COST"
limit_amount = "10000"
limit_unit = "USD"
time_unit = "MONTHLY"
cost_filter {
name = "TagKeyValue"
values = ["user:project$ai-platform"]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 80
threshold_type = "PERCENTAGE"
notification_type = "ACTUAL"
subscriber_email_addresses = ["[email protected] "]
}
notification {
comparison_operator = "GREATER_THAN"
threshold = 100
threshold_type = "FORECASTED"
notification_type = "FORECASTED"
subscriber_email_addresses = ["[email protected] ", "[email protected] "]
}
}
Cost Dashboard Metrics FINOPS_METRICS = {
# Cost metrics
"cost_per_query": "Total cost / number of queries",
"cost_per_token": "Total cost / tokens processed",
"cost_per_user": "Total cost / active users",
"cost_efficiency": "Output value / total cost",
# Utilization metrics
"gpu_utilization": "Active GPU time / provisioned GPU time",
"api_efficiency": "Successful calls / total calls",
"cache_hit_rate": "Cached responses / total requests",
# Optimization metrics
"model_routing_savings": "Baseline cost - actual cost",
"commitment_utilization": "Committed capacity used / purchased",
"spot_savings": "On-demand equivalent - actual spot cost"
}
Cost Optimization Techniques
1. Prompt Engineering for Cost class CostAwarePrompting:
"""Optimize prompts for cost efficiency"""
def optimize_prompt(self, prompt: str, max_tokens: int = None) -> str:
"""Reduce prompt tokens while maintaining quality"""
# Remove redundant whitespace
optimized = ' '.join(prompt.split())
# Use abbreviations for common patterns
optimized = optimized.replace("Please provide", "Provide")
optimized = optimized.replace("I would like you to", "")
optimized = optimized.replace("Can you please", "")
return optimized
def batch_similar_requests(self, requests: list) -> list:
"""Batch similar requests to reduce overhead"""
# Group by similar prompts
batches = {}
for req in requests:
key = self.get_prompt_signature(req)
if key not in batches:
batches[key] = []
batches[key].append(req)
return list(batches.values())
2. Caching Strategy import hashlib
from functools import lru_cache
class SemanticCache:
"""Cache LLM responses by semantic similarity"""
def __init__(self, similarity_threshold: float = 0.95):
self.cache = {}
self.threshold = similarity_threshold
def get_cache_key(self, prompt: str) -> str:
"""Generate cache key from prompt"""
return hashlib.sha256(prompt.encode()).hexdigest()
async def get_or_generate(
self,
prompt: str,
generate_fn,
ttl_seconds: int = 3600
):
"""Return cached response or generate new one"""
cache_key = self.get_cache_key(prompt)
# Check exact match
if cache_key in self.cache:
return self.cache[cache_key]
# Check semantic similarity
similar = await self.find_similar(prompt)
if similar:
return similar
# Generate new response
response = await generate_fn(prompt)
self.cache[cache_key] = response
return response
# Cache hit rates: 30-60% typical for production workloads
# Cost savings: 30-50% on inference costs
3. Spot/Preemptible Instances class SpotInstanceStrategy:
"""Manage spot instances for AI workloads"""
SPOT_SAVINGS = {
"aws": 0.70, # 70% savings typical
"azure": 0.60,
"gcp": 0.65,
}
def recommend_spot_strategy(self, workload_type: str) -> dict:
"""Recommend spot usage based on workload"""
strategies = {
"batch_inference": {
"spot_eligible": True,
"percentage": 100,
"reason": "Interruptible, can retry"
},
"training": {
"spot_eligible": True,
"percentage": 80,
"reason": "Checkpoint frequently, retry on interrupt"
},
"real_time_inference": {
"spot_eligible": False,
"percentage": 0,
"reason": "Latency-sensitive, needs reliability"
},
"dev_environment": {
"spot_eligible": True,
"percentage": 100,
"reason": "Non-critical, cost optimization priority"
}
}
return strategies.get(workload_type, {"spot_eligible": False})
Multi-Cloud Cost Arbitrage
Provider Selection by Cost class MultiCloudCostRouter:
"""Route workloads to cheapest provider"""
PROVIDER_COSTS = {
"embedding": {
"aws_titan": 0.0001,
"azure_ada": 0.0001,
"cohere": 0.0001,
"openai": 0.00013,
},
"chat": {
"aws_claude_haiku": 0.00025,
"azure_gpt35": 0.0005,
"openai_gpt4o_mini": 0.00015,
}
}
def get_cheapest_provider(self, task_type: str) -> tuple:
"""Return cheapest provider for task"""
costs = self.PROVIDER_COSTS.get(task_type, {})
if not costs:
return None, None
cheapest = min(costs.items(), key=lambda x: x[1])
return cheapest
def calculate_arbitrage_savings(
self,
current_provider: str,
current_cost: float,
volume: int
) -> dict:
"""Calculate savings from switching providers"""
alternatives = []
for task, providers in self.PROVIDER_COSTS.items():
for provider, cost in providers.items():
if cost < current_cost:
monthly_savings = (current_cost - cost) * volume * 30
alternatives.append({
"provider": provider,
"cost": cost,
"monthly_savings": f"${monthly_savings:.2f}"
})
return sorted(alternatives, key=lambda x: float(x["monthly_savings"].replace("$", "")), reverse=True)
FinOps Maturity Model ┌─────────────────────────────────────────────────────────────────┐
│ AI FINOPS MATURITY LEVELS │
├─────────────────────────────────────────────────────────────────┤
│ │
│ LEVEL 1: CRAWL │
│ ├── Basic cost visibility │
│ ├── Manual cost tracking │
│ └── Simple tagging │
│ │
│ LEVEL 2: WALK │
│ ├── Automated cost allocation │
│ ├── Budget alerts │
│ ├── Model selection guidelines │
│ └── Basic optimization (caching, batching) │
│ │
│ LEVEL 3: RUN │
│ ├── Real-time cost dashboards │
│ ├── Automated cost anomaly detection │
│ ├── Commitment management │
│ ├── Multi-cloud cost optimization │
│ └── Cost-aware model routing │
│ │
│ LEVEL 4: FLY │
│ ├── Predictive cost modeling │
│ ├── Automated scaling based on cost/performance │
│ ├── Business value attribution │
│ └── Continuous optimization loops │
│ │
└─────────────────────────────────────────────────────────────────┘
Resources 02
Cost Breakdown Framework