Track token usage and estimated cost per agent session. Enforce budget ceilings and optimize costs by routing tasks to the appropriate model tier.
Steps
- Record input and output token counts for every LLM invocation and tool call. Calculate cost using the model's pricing table.
- Aggregate costs by agent session, task, and time period (daily, weekly). Store in
.harness/costs/.
- Load per-task budget ceilings from
.harness/agents/*.json. Post a warning at 80% of budget. Hard-stop the agent at 100% and escalate to Slack with a spend summary.
- Route tasks to the appropriate model tier based on complexity:
- Haiku: trivial tasks (typos, formatting, simple edits)
- Sonnet: standard tasks (feature implementation, bug fixes)
- Opus: complex tasks (architecture decisions, multi-file refactors)
- Generate per-agent and aggregate cost summaries for investor update reports.
- Store historical cost data for trend analysis and budget forecasting.
Anti-Patterns