Name: Cost Audit
Author: drvoss

Buscar habilidades.../

Contenido de la habilidad

Audit AI inference costs and optimize token usage across multi-model pipelines. This is not about cutting capabilities — it is about eliminating waste, right-sizing models, and keeping costs predictable.

When to Use

AI API costs growing faster than usage justifies
Unsure whether you are using the right model tier for each task
Want to compare cost-quality trade-offs before committing to a model
Preparing for production traffic and need a cost baseline
Running fleet mode or parallel agents and want to avoid runaway spend

Model Cost Tiers

Use the most capable model necessary — not the most capable model available.

Tier	Models	Best for
Premium	`claude-opus-4.6`, `claude-opus-4.5`	Architecture decisions, complex multi-file reasoning, security audits
Standard	, ,

claude-sonnet-4.6

claude-sonnet-4.5

gpt-5.2

Change: Replace claude-opus on doc-summary with claude-haiku
Before: ~4,000 tokens × $0.015/1K = $0.06/call
After:  ~4,000 tokens × $0.00025/1K = $0.001/call
Savings: ~$0.059/call, ~$590/10K calls

Priority	Criterion
High	Premium model on a task a fast model handles well
High	Context window > 50K tokens when shorter would suffice
Medium	Duplicate context passed on every call
Medium	Fleet agents with mismatched model tiers
Low	Minor prompt size variations

## Cost Audit Report

### Summary
Estimated waste: ~$X/day at current scale
Top three opportunities: [list]

### Findings

#### [HIGH] Premium model for boilerplate generation
Location: [file or workflow name]
Issue: `claude-opus-4.6` used for all code generation including templates and stubs.
Recommendation: Use `claude-haiku-4.5` for boilerplate; reserve opus for complex tasks.
Estimated savings: ~80% cost reduction on boilerplate tasks.

#### [MEDIUM] Entire codebase passed as context on every PR review
...

Pattern	Fix
Entire conversation history on every call	Summarize old context, keep recent turns
Full file reads when only one function matters	Use `view_range` for targeted reads
Premium model for all parallel agents in fleet	Assign tier per task type
Same instructions repeated in every prompt	Move to shared system prompt
No caching on static reference docs	Check if your API client supports prompt caching

Cost Audit | Skills Pool

Metric	How to measure
Total tokens / task	Compare before and after context changes
Model mix	Tally which models are called per workflow
Prompt size distribution	Log avg/max token counts per call type

Cost Audit

Cost Audit

When to Use

Model Cost Tiers

Workflow

1. Identify high-cost call sites

2. Measure baseline

3. Apply reduction patterns

4. Estimate savings

5. Prioritize

6. Report format

Common Waste Patterns

See Also

Llm Trading Agent Security

Energy Procurement

Council

Carrier Relationship Management

Market Research

Market Research