Track AI provider pricing, model capabilities, monthly costs, and usage. Includes cost optimization recommendations for oh-my-openagent configuration.
Last Updated: March 28, 2026
User: kevinhill
Budget Target: $100-150/month (AFTER Claude Sonnet 5 releases)
Current: ~$278/month (Claude Max 20x $200 + Fireworks $28 + others)
Claude Status: Max 20x ($200/mo) - Waiting for Sonnet 5 "Fennec" before dropping
Expected Post-Sonnet-5: ~$130-150/month
Annual Savings Potential: ~$1,500-1,800 (after Sonnet 5)
| Provider | Plan | Monthly Cost | Status | Value Rating | Notes |
|---|---|---|---|---|---|
| Claude (Anthropic) | Max 20x |
| $200 |
| ⏳ KEEP UNTIL Sonnet 5 |
| ⭐⭐⭐⭐⭐ |
| Waiting for "Fennec" release with Agent Swarm |
| Fireworks AI | Fire Pass | $28 | ✅ ADDED | ⭐⭐⭐⭐⭐ | Backup primary orchestrator (Kimi 200t/s) |
| GitHub Copilot | Pro | $10 | ✅ KEEP | ⭐⭐⭐⭐⭐ | IDE integration |
| OpenAI | ChatGPT Plus | $20 | ✅ KEEP | ⭐⭐⭐⭐⭐ | GPT-5.4 deep reasoning |
| Gemini Advanced | ~$20 | ✅ KEEP | ⭐⭐⭐⭐ | Google ecosystem user |
| Z.ai (GLM) | - | $0 | ❌ CANCEL NOW | ⭐ | Infrastructure trash, quality degraded |
| MiniMax | - | $0 | ❌ CANCEL | N/A | Use OpenCode Go or pay-as-you-go |
| Scenario | Monthly Cost | Annual Cost | Notes |
|---|---|---|---|
| CURRENT | ~$278 | ~$3,336 | Claude Max 20x ($200) + Fireworks + others |
| After Sonnet 5 | ~$130-150 | ~$1,560-1,800 | Drop to Sonnet tier, simplify stack |
| Savings | ~$130-150/mo | ~$1,500-1,800/yr | When Sonnet 5 drops |
You're paying $200/month for:
The bet: Claude Sonnet 5 "Fennec" will have:
When Sonnet 5 drops: Re-evaluate entire stack. Could drop to Sonnet tier + simplify.
accounts/fireworks/routers/kimi-k2p5-turbo
| Type | Price |
|---|---|
| Input | $0.99/1M tokens |
| Cached Input | $0.16/1M tokens |
| Output | $4.94/1M tokens |
Break-even: ~6M tokens/month vs Fire Pass
| Model | Provider | Speed | Coding | Reasoning | Vision | Cost Level | Best For |
|---|---|---|---|---|---|---|---|
| Kimi K2.5 Turbo | Fireworks | ⚡⚡⚡⚡⚡ (200t/s) | 76.8% SWE-Bench | High | ✅ | Very Low | Orchestration, daily coding |
| Kimi K2.5 | OpenCode Go/Moonshot | ⚡⚡⚡ (60t/s) | 76.8% SWE-Bench | High | ✅ | Very Low | Claude alternative |
| GPT-5.4 | OpenAI | ⚡⚡⚡ (40t/s) | 80%+ SWE-Bench | Very High | ✅ | Medium | Deep reasoning, architecture |
| GPT-5.4 Mini | OpenAI | ⚡⚡⚡⚡ (100t/s) | 75% SWE-Bench | High | ❌ | Very Low | Quick tasks |
| Claude Opus 4.6 | Anthropic | ⚡⚡ (25t/s) | 80.8% SWE-Bench | Very High | ✅ | High | Complex debugging |
| Claude Sonnet 4.6 | Anthropic | ⚡⚡⚡ (50t/s) | 75% SWE-Bench | High | ✅ | Medium | General tasks |
| Gemini 3.1 Pro | ⚡⚡⚡ (60t/s) | 75% SWE-Bench | High | ✅ | Low | Frontend, visual tasks | |
| Gemini 3 Flash | ⚡⚡⚡⚡ (150t/s) | 70% SWE-Bench | Medium | ✅ | Very Low | Documentation, fast tasks | |
| MiniMax M2.7 | MiniMax/OpenCode | ⚡⚡⚡⚡ (120t/s) | 80.2% SWE-Bench | High | ❌ | Very Low | Utility, code search |
| MiniMax M2.7 Highspeed | MiniMax | ⚡⚡⚡⚡⚡ (200t/s) | 80.2% SWE-Bench | High | ❌ | Low | Fast utility |
| GLM 5 | Z.ai/OpenCode | ⚡⚡⚡ (50t/s) | 77.8% SWE-Bench | High | ❌ | Low | Claude-like orchestration |
| Grok Code Fast 1 | xAI/GitHub | ⚡⚡⚡⚡⚡ (250t/s) | 70% SWE-Bench | Medium | ❌ | Very Low | Code grep, search |
| Model | Input | Output | Cache Read |
|---|---|---|---|
| Kimi K2.5 (regular) | $0.60 | $2.00 | - |
| Kimi K2.5 Turbo | $0.99 | $4.94 | $0.16 |
| GPT-5.4 | $2.50 | $15.00 | - |
| GPT-5.4 Mini | $0.25 | $2.00 | - |
| GPT-5-Nano | $0.05 | $0.40 | - |
| Claude Opus 4.6 | $5.00 | $25.00 | $0.50 |
| Claude Sonnet 4.6 | $3.00 | $15.00 | $0.30 |
| Claude Haiku 4.5 | $1.00 | $5.00 | $0.10 |
| Gemini 3.1 Pro | $2.00 | $12.00 | - |
| Gemini 3 Flash | $0.10 | $0.40 | - |
| MiniMax M2.7 | $0.30 | $1.20 | $0.06 |
| MiniMax M2.7 Highspeed | $0.60 | $2.40 | $0.06 |
| GLM 5 | $1.00 | $3.20 | $0.11 |
Configuration:
{
"model": "fireworks/accounts/fireworks/routers/kimi-k2p5-turbo",
"baseURL": "https://api.fireworks.ai/inference/v1"
}
API Pricing:
Capacity Constraints — Undersized infrastructure. Can't scale fast enough when demand spikes → "Model unavailable" errors
Degraded Model Quality — Reddit r/ZaiGLM reports confirm GLM-5 quality has declined since launch:
30-Second Idle Timeouts — GitHub issues (vercel/ai #12949) show Z.ai drops connections after 30s idle, killing long agentic workflows
"Improvements" That Break Things — Z.ai pushed updates they called "performance improvements" that actually degraded quality for Max plan users
Beijing-Based Lab — Zhipu AI is China-based. Scaling global infrastructure with international demand + Chinese regulatory constraints = unreliable service outside China
Bottom line: Cheap frontier AI is only good if quality holds. Z.ai isn't holding. Reddit thread: "Don't buy GLM coding plans. Quality is atrocious. It's becoming worse everyday."
Model ID spotted: claude-sonnet-5@20260203 in Google Vertex AI infrastructure
| Feature | Details |
|---|---|
| Codename | "Fennec" |
| Performance | Outperforms Claude Opus 4.5 internally |
| Price | ~50% cheaper than Opus 4.5 |
| Context Window | 1 million tokens (vs 200K now) |
| Killer Feature | "Dev Team" / "Agent Swarm" mode — spawns multiple specialized agents (architect, backend, frontend, QA) that work in parallel |
| SWE-Bench | 80.9% (far surpassing existing models) |
| Release | Imminent (likely Q2 2026) |
Why Wait: Sonnet 5 could be the Claude that makes dropping everything else actually viable:
When Sonnet 5 drops: Re-evaluate entire stack. Could drop from Max 20x to Sonnet tier.
## Month: [Month Year]
### Daily Average Usage
- Hours/day: ___
- Requests/day: ___
- Primary model: ___
### Costs This Month
| Provider | Planned | Actual | Notes |
|----------|---------|--------|-------|
| Claude Max 20x | $200 | $___ | Waiting for Sonnet 5 |
| Fireworks Fire Pass | $28 | $___ | Backup orchestrator |
| GitHub Copilot | $10 | $___ | |
| OpenAI | $20 | $___ | API usage: $___ |
| Google Gemini | $20 | $___ | |
| MiniMax (OpenCode Go) | $10 | $___ | If needed |
| **TOTAL** | **~$288** | **$___** | **Current spend** |
### Sonnet 5 Watch
- [ ] Release announced
- [ ] Benchmarks verified
- [ ] Pricing confirmed
- [ ] Agent Swarm tested
- [ ] Decision: Drop Max 20x?
### If Sonnet 5 is Good - New Stack:
| Provider | Expected Cost |
|----------|---------------|
| Claude Sonnet 5 | ~$50-75? |
| Fireworks | $28 |
| Copilot | $10 |
| OpenAI | $20 |
| Gemini | $20 |
| **TOTAL** | **~$128-153** |
### Model Usage Breakdown
| Model | % Usage | Tokens Used | Cost |
|-------|---------|-------------|------|
| Claude Opus 4.6 (Max 20x) | ___% | ___M | $200 |
| Kimi K2.5 Turbo | ___% | Unlimited | $28 |
| GPT-5.4 | ___% | ___M | $___ |
| Gemini Pro | ___% | ___M | $___ |
| MiniMax M2.7 | ___% | ___M | $___ |
### Optimizations Made / Planned
- [x] **CANCELLED Z.ai GLM** — Infrastructure trash, quality degraded
- [x] **CANCELLED MiniMax subscription** — Use OpenCode Go or pay-as-you-go
- [x] **ADDED Fireworks Fire Pass** — $28/mo unlimited Kimi 200t/s
- [ ] **PENDING: Drop Claude Max 20x** — Waiting for Sonnet 5 "Fennec"
- If Sonnet 5 has Agent Swarm → Could replace OMO orchestration
- If 1M context works → Could simplify entire stack
- If ~$50-75/mo → Save $125-150/month
- [ ] **Evaluate Gemini need** — Do you actually use it beyond Google ecosystem?
### Notes
___
| Agent | Primary Model | Fallback | Monthly Cost |
|---|---|---|---|
| Sisyphus (Orchestrator) | 🔥 Claude Opus 4.6 (Max 20x) | Fireworks Kimi Turbo | $200 |
| Prometheus (Planner) | Claude Opus 4.6 | Fireworks Kimi | Included |
| Hephaestus (Deep work) | OpenAI GPT-5.4 | Gemini Pro | $20 + API |
| Oracle (Consultant) | OpenAI GPT-5.4 | Gemini Pro | API usage |
| Explore/Librarian | Fireworks Kimi / MiniMax | Copilot Grok | $28 |
| Frontend | Gemini 3.1 Pro | Fireworks Kimi | $20 |
Current Total: ~$278/month
| Agent | Primary Model | Why |
|---|---|---|
| Everything | 🚀 Claude Sonnet 5 "Fennec" | Agent Swarm built-in? 1M context? 50% cheaper? |
| Backup | Fireworks Kimi Turbo | $28/mo unlimited |
| IDE | GitHub Copilot Pro | $10 |
| Reasoning | GPT-5.4 (if needed) | $20 |
| Gemini Advanced | $20 (if still needed) |
Potential Total: ~$130-150/month
Savings: $130-150/month when Sonnet 5 drops
You're paying $200/month for Max 20x because:
If Sonnet 5 has:
Risk: Sonnet 5 could be delayed or not live up to leaks. But at $200/month, the potential payoff is worth the wait.
| Provider | Billing | Support | Docs |
|---|---|---|---|
| Fireworks | Billing | [email protected] | Docs |
| GitHub Copilot | Settings | GitHub Support | Docs |
| OpenAI | Billing | OpenAI Help | API Docs |
| Google Gemini | Google One | Google Support | Gemini Docs |
| Anthropic Claude | Account | Anthropic Support | Claude Docs |
| Z.ai | Z.ai Billing | Z.ai Support | GLM Docs |
| MiniMax | MiniMax Platform | MiniMax Support | API Docs |
| OpenCode | OpenCode | Discord/Discord | Docs |
| Use Case | Primary Model | Provider | Monthly Cost | Why |
|---|---|---|---|---|
| Orchestration | 🔥 Claude Opus 4.6 (Max 20x) | Anthropic | $200 | You hit Pro limits, need 220K tokens/5hr |
| Deep Reasoning | GPT-5.4 | OpenAI | $20 + API | Architecture, Oracle, Hephaestus |
| Frontend/UI | Gemini 3.1 Pro | $20 | Google ecosystem user | |
| Backup/Fast | Kimi K2.5 Turbo | Fireworks | $28 | Unlimited, 200 tok/s, testing as backup |
| IDE | Copilot | GitHub | $10 | Completions |
| Utility | MiniMax M2.7 | OpenCode Go | $10 | If needed |
Total: ~$288/month
| Use Case | Primary Model | Expected Cost |
|---|---|---|
| Everything | 🚀 Claude Sonnet 5 "Fennec" | ~$50-75? |
| Backup | Fireworks Kimi Turbo | $28 |
| IDE | Copilot Pro | $10 |
| Specialized | GPT-5.4 (if needed) | $20 |
| Gemini Advanced (if still needed) | $20 |
Potential Total: ~$130-150/month
Savings: $130-150/month ($1,500-1,800/year)
Generated by Claude for kevinhill
Tracker Version: 1.0
Update Frequency: Monthly or when pricing changes