Ship agents that fix themselves. Kalibr learns what's working as your agents run in production and routes them around failures, degradations, and cost spikes before you know they're happening.
Ship agents that fix themselves. Kalibr learns what's working as your agents run in production and routes them around failures, degradations, and cost spikes before you know they're happening.
You define candidate execution paths — model + tools + parameters. Kalibr figures out which one wins for each task from real production telemetry. When a path degrades at 3am, your agents are already on the next best path. No alerts. No debugging. No you.
model="gpt-4o" and wants something that adaptspip install kalibr
Get credentials at https://dashboard.kalibr.systems/settings
export KALIBR_API_KEY="your-api-key"
export KALIBR_TENANT_ID="your-tenant-id"
openclaw plugins install @kalibr/openclaw
from kalibr import Router
router = Router(
goal="extract-emails",
paths=[
{"model": "gpt-4o", "tools": ["web_search"]},
{"model": "claude-sonnet-4-20250514"},
{"model": "gemini-2.0-flash", "params": {"temperature": 0.2}},
]
)
response = router.completion(
messages=[{"role": "user", "content": "Extract emails from this page..."}]
)
# This is how Kalibr learns — tell it what worked
router.report(success="@" in response.choices[0].message.content)
Kalibr routes the full execution path — model + tools + parameters — not just the model. After ~20 outcomes it knows what's winning. After 50 it's locked in and adapting.
Skip manual reporting — define success inline:
router = Router(
goal="extract-emails",
paths=["gpt-4o", "claude-sonnet-4-20250514", "gemini-2.0-flash"],
success_when=lambda output: "@" in output,
)
# Kalibr reports outcomes automatically after every call
response = router.completion(messages=[...])
OpenRouter / LiteLLM routing: Model proxy. Routes based on price, speed, availability. Doesn't know if the response was actually good for your task.
Fallback systems (LangChain ModelFallbackMiddleware): Reactive. Waits for a failure, then tries the next model. You already lost that request.
Kalibr: Learns from your actual production telemetry — per task, per path. Routes to what's working before anything breaks. 10% canary traffic keeps testing alternatives so Kalibr catches degradation before your users do.
pip install langchain-kalibr — drop-in ChatModelChatKalibr as any agent's llmKalibr captures telemetry on every agent run — latency, success, cost, provider status. It uses Thompson Sampling to balance exploration (trying paths) vs. exploitation (using the best). 10% canary traffic keeps testing alternatives so Kalibr catches degradation before your users do.
Success rate always dominates. Kalibr never sacrifices quality for cost.