Debug why inference.local or external inference setup is failing. Use when the user cannot reach a local model server, has provider base URL issues, sees inference verification failures, hits protocol mismatches, or needs to diagnose inference on local vs remote gateways. Trigger keywords - debug inference, inference.local, local inference, ollama, vllm, sglang, trtllm, NIM, inference failing, model server unreachable, failed to verify inference endpoint, host.openshell.internal.
Diagnose why OpenShell inference is failing and recommend exact fix commands.
Use openshell CLI commands to inspect the active gateway, provider records, managed inference config, and sandbox behavior. Use a short sandbox probe when needed to confirm end-to-end routing.
OpenShell supports two different inference paths. Diagnose the correct one first.
https://inference.local
openshell inference setapi.openai.com
network_policiesFor local or self-hosted engines such as Ollama, vLLM, SGLang, TRT-LLM, and many NIM deployments, the most common managed inference pattern is an openai provider with pointing at a host the gateway can reach.
OPENAI_BASE_URLopenshell is on the PATHUse these commands first:
# Which gateway is active, and can the CLI reach it?
openshell status
# Show managed inference config for inference.local
openshell inference get
# Inspect the provider record referenced by inference.local
openshell provider get <provider-name>
# Inspect gateway topology details when remote/local confusion is suspected
openshell gateway info
# Run a minimal end-to-end probe from a sandbox
openshell sandbox create -- curl https://inference.local/v1/chat/completions --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
When the user asks to debug inference, run diagnostics automatically in this order. Stop and report findings as soon as a root cause is identified.
Establish these facts first:
https://inference.local or a direct external host?Run:
openshell status
openshell gateway info
Look for:
host.openshell.internal would point to the local machine or a remote hostCommon mistake:
host.openshell.internal points to the remote gateway host, not your laptop. A laptop-local Ollama or vLLM server will not be reachable without a tunnel or shared reachable network path.Run:
openshell inference get
Interpretation:
Not configured: inference.local has no backend yet. Fix by configuring it:
openshell inference set --provider <name> --model <id>
Provider and model shown: Continue to provider inspection.
Run:
openshell provider get <provider-name>
Check:
openai for OpenAI-compatible engines such as Ollama, vLLM, SGLang, TRT-LLM, and many NIM deploymentsanthropic for Anthropic Messages APInvidia for NVIDIA-hosted OpenAI-compatible endpoints*_BASE_URL override is correct when using a self-hosted endpointFix examples:
openshell provider create --name ollama --type openai --credential OPENAI_API_KEY=empty --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1
openshell provider update ollama --type openai --credential OPENAI_API_KEY=empty --config OPENAI_BASE_URL=http://host.openshell.internal:11434/v1
For host-backed local inference, confirm the upstream server:
0.0.0.0, not only 127.0.0.1host.openshell.internal, the host's LAN IP, or another reachable hostnameCommon mistakes:
127.0.0.1 or localhost: usually wrong for managed inference. Replace with host.openshell.internal or the host's LAN IP.0.0.0.0.Managed inference only works for https://inference.local and supported inference API paths.
Supported patterns include:
POST /v1/chat/completionsPOST /v1/completionsPOST /v1/responsesPOST /v1/messagesGET /v1/modelsCommon mistakes:
http://inference.local instead of https://inference.localopenai provider, or vice versaFix guidance:
https://inference.local/v1testRun a minimal request from inside a sandbox:
openshell sandbox create -- curl https://inference.local/v1/chat/completions --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
Interpretation:
cluster inference is not configured: set the managed route with openshell inference setconnection not allowed by policy on inference.local: unsupported method or pathno compatible route: provider type and client API shape do not matchAfter fixing the provider, repoint inference.local:
openshell inference set --provider <name> --model <id>
If the endpoint is intentionally offline and you only want to save the config:
openshell inference set --provider <name> --model <id> --no-verify
Inference updates are hot-reloaded to all sandboxes on the active gateway within about 5 seconds by default.
If the application calls api.openai.com, api.anthropic.com, or another external host directly, this is not a managed inference issue.
Check instead:
network_policies allow that host, port, and HTTP rulesUse the generate-sandbox-policy skill when the user needs help authoring policy YAML.
Use this fix when a sandbox can reach https://inference.local, but OpenShell reports an upstream timeout against a host-local backend such as Ollama.
Example symptom:
{"error":"request to http://host.docker.internal:11434/v1/models timed out"}
This failure commonly appears on Linux hosts that:
inference.local to a host-local OpenAI-compatible endpoint such as OllamaIn this case, OpenShell routing is usually working correctly. The failing hop is container-to-host traffic on the backend port.
This is not the same issue as the Colima CoreDNS fix.
OpenShell injects host.docker.internal and host.openshell.internal into sandbox pods with hostAliases. That path bypasses cluster DNS lookup. If the request still times out, the usual cause is host firewall or network policy, not CoreDNS.
Confirm the model server works on the host:
curl -sS http://127.0.0.1:11434/v1/models
Confirm the host gateway address also works on the host:
curl -sS http://172.17.0.1:11434/v1/models
Test the same endpoint from the OpenShell cluster container:
docker exec openshell-cluster-<gateway> wget -qO- -T 5 http://host.docker.internal:11434/v1/models
If steps 1 and 2 succeed but step 3 times out, the host firewall or network configuration is blocking the container-to-host path.
Allow the Docker bridge network used by the OpenShell cluster to reach the host-local inference port. The exact command depends on your firewall tooling (iptables, nftables, firewalld, UFW, etc.), but the rule should allow:
172.18.0.0/16)host.docker.internal (commonly 172.17.0.1)11434/tcp for Ollama)To find the actual values on your system:
# Docker bridge subnet for the OpenShell cluster network
docker network inspect $(docker network ls --filter name=openshell -q) --format '{{range .IPAM.Config}}{{.Subnet}}{{end}}'
# Host gateway IP visible from inside the container
docker exec openshell-cluster-<gateway> cat /etc/hosts | grep host.docker.internal
Adjust the source subnet, destination IP, or port to match your local Docker network layout.
Re-run the cluster container check:
docker exec openshell-cluster-<gateway> wget -qO- -T 5 http://host.docker.internal:11434/v1/models
Re-test from a sandbox:
curl -sS https://inference.local/v1/models
Both commands should return the upstream model list.
ss -ltnp | rg ':11434\b'openshell provider get <provider-name>openshell inference getopenshell logs <sandbox-name> --since 10m| Symptom | Likely cause | Fix |
|---|---|---|
openshell inference get shows Not configured | No managed inference route configured | openshell inference set --provider <name> --model <id> |
failed to verify inference endpoint | Bad base URL, wrong credentials, wrong provider type, or upstream not reachable | Fix provider config, then rerun openshell inference set; use --no-verify only when the endpoint is intentionally offline |
Base URL uses 127.0.0.1 | Loopback points at the wrong runtime | Use host.openshell.internal or another gateway-reachable host |
| Local engine works only when gateway is local | Gateway moved to remote host | Run the engine on the gateway host, add a tunnel, or use direct external access |
connection not allowed by policy on inference.local | Unsupported path or method | Use a supported inference API path |
no compatible route | Provider type does not match request shape | Switch provider type or change the client API |
| Direct call to external host is denied | Missing policy or provider attachment | Update network_policies and launch sandbox with the right provider |
| SDK fails on empty auth token | Client requires a non-empty API key even though OpenShell injects the real one | Use any placeholder token such as test |
| Upstream timeout from container to host-local backend | Host firewall or network config blocks container-to-host traffic | Allow the Docker bridge subnet to reach the inference port on the host gateway IP (see firewall fix section above) |
Run this when you want a compact report before deciding on a fix:
echo "=== Gateway Status ==="
openshell status
echo "=== Gateway Info ==="
openshell gateway info
echo "=== Managed Inference ==="
openshell inference get
echo "=== Providers ==="
openshell provider list
echo "=== Selected Provider ==="
openshell provider get <provider-name>
echo "=== Sandbox Probe ==="
openshell sandbox create -- curl https://inference.local/v1/chat/completions --json '{"messages":[{"role":"user","content":"hello"}],"max_tokens":10}'
When you report back, state:
inference.local vs direct external)