Multi-LLM provider system supporting Anthropic, OpenAI, Google, Cohere, and Ollama with load balancing and cost optimization
The @claude-flow/providers module is the Multi-LLM Provider System for Claude Flow v3, supporting Anthropic (Claude), OpenAI (GPT-4o, o1), Google (Gemini), Cohere (Command R+), and Ollama (local models). It provides load balancing, automatic failover, request caching, cost optimization (85%+ savings), circuit breaker protection, and health monitoring.
claude-3-5-sonnet-20241022, claude-3-5-sonnet-latestclaude-3-opus-20240229claude-3-sonnet-20240229, claude-3-haiku-20240307gpt-4o, gpt-4o-mini, gpt-4-turbo, , gpt-4gpt-3.5-turboo1-preview, o1-mini, o3-minigemini-2.0-flash, gemini-1.5-pro, gemini-1.5-flash, gemini-procommand-r-plus, command-r, command-light, commandllama3.2, llama3.1, mistral, mixtralcodellama, phi-4, deepseek-coderimport { createProviderManager } from '@claude-flow/providers';
const manager = createProviderManager({
providers: [
{ provider: 'anthropic', apiKey: '...', model: 'claude-3-5-sonnet-latest' },
{ provider: 'openai', apiKey: '...', model: 'gpt-4o' },
{ provider: 'ollama', apiUrl: 'http://localhost:11434', model: 'llama3.2' },
],
defaultProvider: 'anthropic',
loadBalancing: {
enabled: true,
strategy: 'cost-based', // 'round-robin' | 'least-loaded' | 'latency-based' | 'cost-based'
},
fallback: {
enabled: true,
maxAttempts: 3,
},
cache: {
enabled: true,
ttl: 3600,
maxSize: 1000,
},
costOptimization: {
enabled: true,
maxCostPerRequest: 0.50,
},
});
Every provider implements ILLMProvider:
interface ILLMProvider extends EventEmitter {
readonly name: LLMProvider;
readonly capabilities: ProviderCapabilities;
initialize(): Promise<void>;
complete(request: LLMRequest): Promise<LLMResponse>;
streamComplete(request: LLMRequest): AsyncIterable<LLMStreamEvent>;
listModels(): Promise<LLMModel[]>;
getModelInfo(model: LLMModel): Promise<ModelInfo>;
validateModel(model: LLMModel): boolean;
healthCheck(): Promise<HealthCheckResult>;
getStatus(): ProviderStatus;
estimateCost(request: LLMRequest): Promise<CostEstimate>;
getUsage(period?: UsagePeriod): Promise<UsageStats>;
destroy(): void;
}
const response = await manager.complete({
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'Hello!' },
],
model: 'claude-3-5-sonnet-latest',
temperature: 0.7,
maxTokens: 1024,
});
// response.content — Generated text
// response.usage — { promptTokens, completionTokens, totalTokens }
// response.cost — { promptCost, completionCost, totalCost, currency }
// response.latency — Response time in ms
// response.finishReason — 'stop' | 'length' | 'tool_calls' | 'content_filter'
for await (const event of manager.streamComplete(request)) {
switch (event.type) {
case 'content': console.log(event.delta?.content); break;
case 'tool_call': /* handle tool call */ break;
case 'done': /* completion finished */ break;
case 'error': /* handle error */ break;
}
}
const response = await manager.complete({
messages: [...],
tools: [{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather',
parameters: {
type: 'object',
properties: { location: { type: 'string' } },
required: ['location'],
},
},
}],
toolChoice: 'auto', // 'auto' | 'none' | 'required' | { function: { name } }
});
const response = await manager.complete({
messages: [...],
costConstraints: {
maxCost: 0.10,
preferredModels: ['claude-3-haiku-20240307', 'gpt-4o-mini'],
},
});
interface ProviderCapabilities {
supportedModels: LLMModel[];
maxContextLength: Record<string, number>;
maxOutputTokens: Record<string, number>;
supportsStreaming: boolean;
supportsToolCalling: boolean;
supportsSystemMessages: boolean;
supportsVision: boolean;
supportsAudio: boolean;
supportsFineTuning: boolean;
supportsEmbeddings: boolean;
supportsBatching: boolean;
rateLimit?: { requestsPerMinute, tokensPerMinute, concurrentRequests };
pricing: Record<string, { promptCostPer1k, completionCostPer1k, currency }>;
}
const estimate = await provider.estimateCost(request);
// { estimatedPromptTokens, estimatedCompletionTokens, estimatedTotalTokens,
// estimatedCost: { prompt, completion, total, currency }, confidence }
const usage = await provider.getUsage('day');
// { period, requests, tokens, cost, errors, averageLatency, modelBreakdown }
Typed errors for precise error handling:
code, provider, statusCode, retryableretryAfter durationimport { isRateLimitError, isLLMProviderError } from '@claude-flow/providers';
try {
await provider.complete(request);
} catch (error) {
if (isRateLimitError(error)) {
await sleep(error.retryAfter * 1000);
}
}
The BaseProvider abstract class provides all providers with:
| Strategy | Description |
|---|---|
| round-robin | Rotate through providers sequentially |
| least-loaded | Route to provider with lowest active requests |
| latency-based | Route to fastest responding provider |
| cost-based | Route to cheapest provider meeting requirements |