Multi-model automatic fallback system. Monitors model availability and automatically falls back to backup models when the primary model fails. Supports MiniMax, Kimi, Zhipu and other OpenAI-compatible APIs. Use when: (1) Primary model API is unavailable, (2) Model response time is too slow, (3) Rate limit exceeded, (4) Need to optimize costs by using cheaper models for simple tasks.
Multi-model automatic fallback system for AI agents
This skill provides automatic model fallback functionality for OpenClaw agents. When the primary model fails (unavailable, slow, or rate-limited), it automatically switches to backup models in a predefined priority order.
| Provider | Model | Context | Use Case |
|---|---|---|---|
| MiniMax | M2.5 | 200K | Primary (reasoning) |
| MiniMax | M2.1 | 200K | Backup |
| Kimi | K2.5 | 256K | Long documents |
| Kimi | K2 | 128K | Standard |
| Zhipu | GLM-4-Air | 128K | Low cost |
| Zhipu | GLM-4-Flash | 1M | High volume |
{
"fallback_chain": [
{
"provider": "minimax-portal",
"model": "MiniMax-M2.5",
"priority": 1,
"timeout": 30,
"max_retries": 3
},
{
"provider": "moonshot",
"model": "kimi-k2.5",
"priority": 2,
"timeout": 30,
"max_retries": 2
},
{
"provider": "zhipu",
"model": "glm-4-air",
"priority": 3,
"timeout": 20,
"max_retries": 2
}
]
}
| Variable | Required | Description |
|---|---|---|
MODEL_FALLBACK_ENABLED | No | Enable/disable fallback (default: true) |
MODEL_FALLBACK_LOG_LEVEL | No | Log level: debug, info, warn, error |
The skill automatically handles model failures. No explicit calls needed.
# Trigger a model call (fallback happens automatically on failure)
# Force fallback to next model
/scripts/model-fallback.sh --force-next
# Check current model status
/scripts/model-fallback.sh --status
# Reset to primary model
/scripts/model-fallback.sh --reset
Edit config.json to customize the fallback chain:
{
"fallback_chain": [
{"provider": "...", "model": "...", "priority": 1}
],
"health_check": {
"enabled": true,
"interval_seconds": 300
}
}
1. User makes request with primary model
2. Model call fails (error, timeout, rate limit)
3. Skill detects failure
4. Wait 3 seconds (debounce)
5. Switch to next model in chain
6. Retry request with new model
7. If successful, return result
8. If failed, repeat steps 4-7
9. If all models fail, return error with details
| Trigger | Condition | Action |
|---|---|---|
| API Unavailable | Connection timeout | Fallback |
| Rate Limit | 429 response | Fallback + wait |
| Slow Response | > timeout seconds | Fallback |
| Invalid Response | Parse error | Fallback |
| Auth Error | 401/403 response | Log + stop |
Logs are written to:
~/.openclaw/logs/model-fallback.log[2026-02-27 14:00:00] [INFO] Primary model MiniMax-M2.5 called
[2026-02-27 14:00:05] [WARN] Model failed: rate limit exceeded
[2026-02-27 14:00:05] [INFO] Falling back to Kimi K2.5
[2026-02-27 14:00:10] [INFO] Fallback successful
Use cheaper models for simple tasks:
{
"task_routing": {
"simple_query": ["glm-4-air", "glm-4-flash"],
"complex_reasoning": ["MiniMax-M2.5", "kimi-k2.5"],
"long_context": ["kimi-k2.5", "MiniMax-M2.1"]
}
}
Add to openclaw.json:
{
"models": {
"mode": "merge",
"fallback": {
"enabled": true,
"config": "~/.openclaw/skills/model-fallback/config.json"
}
}
}
Integrate with system health monitoring:
# Check model health
curl http://localhost:18789/api/models/health
echo $MODEL_FALLBACK_ENABLEDls ~/.openclaw/skills/model-fallback/config.jsontail -f ~/.openclaw/logs/model-fallback.logUser: "Hello"
System: Using MiniMax-M2.5...
System: Rate limited, switching to Kimi K2.5...
System: Response from Kimi K2.5: "Hello! How can I help?"
User: "What is 2+2?"
System: Routing to glm-4-air (low cost)...
System: Response: "2+2=4"
User: "Summarize this 100-page PDF"
System: Detected long context requirement
System: Routing to Kimi K2.5 (256K context)...
System: Processing...
MIT
CC (AI Assistant)
1.0.0