技能檔案

NemoClaw User Configure Inference

Name: NemoClaw User Configure Inference
Author: NVIDIA

Lists all inference providers offered during NemoClaw onboarding. Use when explaining which providers are available, what the onboard wizard presents, or how inference routing works. Changes the active inference model without restarting the sandbox. Use when switching inference providers, changing the model runtime, or reconfiguring inference routing. Connects NemoClaw to a local inference server. Use when setting up Ollama, vLLM, TensorRT-LLM, NIM, or any OpenAI-compatible local model server with NemoClaw.

NVIDIA19,418 星標2026年4月17日

職業
分類: 雲端

技能內容

NemoClaw User Configure Inference

Lists all inference providers offered during NemoClaw onboarding. Use when explaining which providers are available, what the onboard wizard presents, or how inference routing works.

Context

NemoClaw supports multiple inference providers. During onboarding, the nemoclaw onboard wizard presents a numbered list of providers to choose from. Your selection determines where the agent's inference traffic is routed.

How Inference Routing Works

The agent inside the sandbox talks to inference.local. It never connects to a provider directly. OpenShell intercepts inference traffic on the host and forwards it to the provider you selected.

Provider credentials stay on the host. The sandbox does not receive your API key.

Provider Status

相關技能

NemoClaw User Configure Inference | Skills Pool

技能檔案

NemoClaw User Configure Inference

NVIDIA19,418 星標2026年4月17日

職業
分類: 雲端

技能內容

NemoClaw User Configure Inference

Lists all inference providers offered during NemoClaw onboarding. Use when explaining which providers are available, what the onboard wizard presents, or how inference routing works.

Context

How Inference Routing Works

The agent inside the sandbox talks to inference.local. It never connects to a provider directly. OpenShell intercepts inference traffic on the host and forwards it to the provider you selected.

Provider credentials stay on the host. The sandbox does not receive your API key.

Provider Status

相關技能

Provider	Status	Endpoint type	Notes
NVIDIA Endpoints	Tested	OpenAI-compatible	Hosted models on integrate.api.nvidia.com
OpenAI	Tested	Native OpenAI-compatible	Uses OpenAI model IDs
Other OpenAI-compatible endpoint	Tested	Custom OpenAI-compatible	For compatible proxies and gateways
Anthropic	Tested	Native Anthropic	Uses anthropic-messages
Other Anthropic-compatible endpoint	Tested	Custom Anthropic-compatible	For Claude proxies and compatible gateways
Google Gemini	Tested	OpenAI-compatible	Uses Google's OpenAI-compatible endpoint
Local Ollama	Caveated	Local Ollama API	Available when Ollama is installed or running on the host
Local NVIDIA NIM	Experimental	Local OpenAI-compatible	Requires `NEMOCLAW_EXPERIMENTAL=1` and a NIM-capable GPU
Local vLLM	Experimental	Local OpenAI-compatible	Requires `NEMOCLAW_EXPERIMENTAL=1` and a server already running on `localhost:8000`

Option	Description	Curated models
NVIDIA Endpoints	Routes to models hosted on build.nvidia.com. You can also enter any model ID from the catalog. Set `NVIDIA_API_KEY`.	Nemotron 3 Super 120B, Kimi K2.5, GLM-5, MiniMax M2.5, GPT-OSS 120B
OpenAI	Routes to the OpenAI API. Set `OPENAI_API_KEY`.	`gpt-5.4`, `gpt-5.4-mini`, `gpt-5.4-nano`, `gpt-5.4-pro-2026-03-05`
Other OpenAI-compatible endpoint	Routes to any server that implements `/v1/chat/completions`. If the endpoint also supports `/responses` with OpenClaw-style tool calling, NemoClaw can use that path; otherwise it falls back to `/chat/completions`. The wizard prompts for a base URL and model name. Works with OpenRouter, LocalAI, llama.cpp, or any compatible proxy. Set `COMPATIBLE_API_KEY`.	You provide the model name.
Anthropic	Routes to the Anthropic Messages API. Set `ANTHROPIC_API_KEY`.	`claude-sonnet-4-6`, `claude-haiku-4-5`, `claude-opus-4-6`
Other Anthropic-compatible endpoint	Routes to any server that implements the Anthropic Messages API (`/v1/messages`). The wizard prompts for a base URL and model name. Set `COMPATIBLE_ANTHROPIC_API_KEY`.	You provide the model name.
Google Gemini	Routes to Google's OpenAI-compatible endpoint. NemoClaw prefers `/responses` only when the endpoint proves it can handle tool calling in a way OpenClaw uses; otherwise it falls back to `/chat/completions`. Set `GEMINI_API_KEY`.	`gemini-3.1-pro-preview`, `gemini-3.1-flash-lite-preview`, `gemini-3-flash-preview`, `gemini-2.5-pro`, `gemini-2.5-flash`, `gemini-2.5-flash-lite`
Local Ollama	Routes to a local Ollama instance on `localhost:11434`. NemoClaw detects installed models, offers starter models if none are present, pulls and warms the selected model, and validates it.	Selected during onboarding. For more information, refer to Use a Local Inference Server (see the `nemoclaw-user-configure-inference` skill).

$ openshell inference set --provider nvidia-prod --model nvidia/nemotron-3-super-120b-a12b

$ openshell inference set --provider openai-api --model gpt-5.4

$ openshell inference set --provider anthropic-prod --model claude-sonnet-4-6

$ openshell inference set --provider gemini-api --model gemini-2.5-flash

$ openshell inference set --provider compatible-endpoint --model <model-name>

$ openshell inference set --provider compatible-anthropic-endpoint --model <model-name>

$ nemoclaw onboard

$ NEMOCLAW_PREFERRED_API=openai-responses nemoclaw onboard

$ openshell inference set --provider anthropic-prod --model claude-sonnet-4-6 --no-verify

$ export NEMOCLAW_MODEL_OVERRIDE="anthropic/claude-sonnet-4-6"
$ export NEMOCLAW_INFERENCE_API_OVERRIDE="anthropic-messages"
$ nemoclaw onboard --resume --recreate-sandbox

$ nemoclaw <name> status

$ nemoclaw <name> status --json

$ nemoclaw onboard

$ NEMOCLAW_PROVIDER=ollama \
  NEMOCLAW_MODEL=qwen2.5:14b \
  nemoclaw onboard --non-interactive

$ vllm serve meta-llama/Llama-3.1-8B-Instruct --port 8000

$ nemoclaw onboard

$ NEMOCLAW_PROVIDER=custom \
  NEMOCLAW_ENDPOINT_URL=http://localhost:8000/v1 \
  NEMOCLAW_MODEL=meta-llama/Llama-3.1-8B-Instruct \
  COMPATIBLE_API_KEY=dummy \
  nemoclaw onboard --non-interactive

Variable	Purpose
`NEMOCLAW_PROVIDER`	Set to `custom` for an OpenAI-compatible endpoint.
`NEMOCLAW_ENDPOINT_URL`	Base URL of the local server.
`NEMOCLAW_MODEL`	Model ID as reported by the server.
`COMPATIBLE_API_KEY`	API key for the endpoint. Use any non-empty value if authentication is not required.

$ NEMOCLAW_PREFERRED_API=openai-responses nemoclaw onboard

$ nemoclaw onboard

$ NEMOCLAW_PROVIDER=anthropicCompatible \
  NEMOCLAW_ENDPOINT_URL=http://localhost:8080 \
  NEMOCLAW_MODEL=my-model \
  COMPATIBLE_ANTHROPIC_API_KEY=dummy \
  nemoclaw onboard --non-interactive

$ NEMOCLAW_EXPERIMENTAL=1 nemoclaw onboard

$ NEMOCLAW_EXPERIMENTAL=1 \
  NEMOCLAW_PROVIDER=vllm \
  nemoclaw onboard --non-interactive

$ NEMOCLAW_EXPERIMENTAL=1 nemoclaw onboard

$ NEMOCLAW_EXPERIMENTAL=1 \
  NEMOCLAW_PROVIDER=nim \
  nemoclaw onboard --non-interactive

$ export NEMOCLAW_LOCAL_INFERENCE_TIMEOUT=300
$ nemoclaw onboard

$ nemoclaw <name> status

$ openshell inference set --provider compatible-endpoint --model <model-name>

Option	Condition	Notes
Local NVIDIA NIM	NIM-capable GPU detected	Pulls and manages a NIM container.
Local vLLM	vLLM running on `localhost:8000`	Auto-detects the loaded model.

Variable	Purpose
`NEMOCLAW_PROVIDER`	Set to `ollama`.
`NEMOCLAW_MODEL`	Ollama model tag to use. Optional.

NemoClaw User Configure Inference

NemoClaw User Configure Inference

Context

How Inference Routing Works

Provider Status

NemoClaw User Configure Inference

NemoClaw User Configure Inference

Context

How Inference Routing Works

Provider Status

Provider Options

Experimental Options

Validation

Prerequisites

Step 1: Switch to a Different Model

NVIDIA Endpoints

OpenAI

Anthropic

Google Gemini

Compatible Endpoints

Switching from Responses API to Chat Completions

Step 2: Cross-Provider Switching

Step 3: Verify the Active Model

Step 4: Notes

Step 5: Ollama

Authenticated Reverse Proxy

Non-Interactive Setup

Step 6: OpenAI-Compatible Server

Non-Interactive Setup

Selecting the API Path

Step 7: Anthropic-Compatible Server

Step 8: vLLM Auto-Detection (Experimental)

Non-Interactive Setup

Step 9: NVIDIA NIM (Experimental)

Non-Interactive Setup

Step 10: Timeout Configuration

Step 11: Verify the Configuration

Step 12: Switch Models at Runtime

Related Skills

Feishu Drive

Nanoclaw Repl

Crosspost

Cloudflare

Mcp Integration

Setup Deploy