LLM provider adapter layer with streaming, multi-model parallel execution, cost tracking, and graceful fallback
Build a multi-provider LLM adapter layer with streaming, parallel execution, cost/token tracking, and graceful fallback across providers.
Architecture (be-architect): Design adapter layer
.claude/specs/be-llm-{providers}.mdImplementation (be-implementer): Build adapters
Validation (be-validator): Schema enforcement
Provider Layer (be-provider): Multi-provider integration
Resilience (be-resilience): Fault tolerance
Testing (be-tester): Comprehensive tests
$ARGUMENTS.providers — LLM providers to integrate$ARGUMENTS.features — Features to include (default: streaming,fallback,cost-tracking)$ARGUMENTS.endpoint — API path (default: /api/v1/chat)Start by spawning be-architect with:
Design an LLM integration layer for providers: $ARGUMENTS.providers
Features: ${ARGUMENTS.features || "streaming,fallback,cost-tracking"}
Endpoint: ${ARGUMENTS.endpoint || "/api/v1/chat"}
Requirements:
- Unified LLMProvider interface across all providers
- SSE streaming for real-time responses
- Automatic fallback when primary provider fails
- Per-request token counting and cost tracking
- Rate limiting per provider (token bucket)
- Circuit breakers on all provider calls
- Zod validation on all inputs/outputs
- RFC 9457 errors for provider failures
Provide the architecture spec, then delegate:
1. be-implementer for route/handler skeleton
2. be-provider for provider adapters (parallel with implementer)
3. be-validator for schemas (parallel)
4. be-resilience for circuit breakers and retries
5. be-tester for contract and streaming tests