Use Ollama local inference for text generation, classification, and analysis via local GPU models.
Use this skill when FORGE needs local LLM inference without external API calls. Ollama runs on the local GPU server (RTX 3090) and provides zero-cost text generation, classification, and analysis.
Default operating pattern:
ollama run via stdin for multi-line prompts.Ollama is not a coding agent. It does not edit files, manage sessions, or run tools. Use it for inference tasks: classification, summarisation, text generation, and structured analysis.
Choose the model based on the task:
| Routing tier | Model | Use case |
|---|---|---|
fast | gpt-oss:latest | Short answers, quick classification, low-latency checks |
balanced | qwen3-coder:latest | Code analysis, structured classification, moderate reasoning |
high | qwen3.5:27b | Complex reasoning, long-form analysis, nuanced generation |
When the caller does not specify a model, default to qwen3-coder:latest.
generate(prompt, model)General-purpose text generation.
Preferred command:
printf '%s\n' "$PROMPT" | ollama run "$MODEL"
For short single-line prompts:
ollama run "$MODEL" "$PROMPT"
Expected behavior:
classify(text, categories, model)Classify input text into one of the provided categories.
Preferred command:
printf 'Classify the following text into exactly one of these categories: %s\n\nText: %s\n\nRespond with only the category name.' "$CATEGORIES" "$TEXT" | ollama run "$MODEL"
Expected behavior:
qwen3-coder:latest or gpt-oss:latest for fast classificationanalyze(prompt, model)Longer-form analysis such as code review summaries, log analysis, or document review.
Preferred command:
printf '%s\n' "$PROMPT" | ollama run "$MODEL"
Expected behavior:
qwen3.5:27b for complex analysis tasksqwen3-coder:latest for code-specific analysis| FORGE session field | Ollama CLI |
|---|---|
prompt | stdin pipe or positional arg |
agent = "ollama" | ollama run |
model | model name arg (e.g., qwen3-coder:latest) |
tools | N/A |
workdir | process working directory |
timeout | managed by FORGE wrapper |
budget | N/A (local inference, no cost) |
resume | N/A |
output_mode | N/A (always plain text) |
| Ollama output | AgentResult field |
|---|---|
| stdout (full text) | output |
| exit code 0 | success (no error) |
| non-zero exit code | error |
| N/A | session_id (not supported) |
| N/A | tokens_in (not exposed by CLI) |
| N/A | tokens_out (not exposed by CLI) |
| 0.00 | cost_usd (local, always zero) |
Limitation: ollama run does not expose token usage or timing metadata in its CLI output. If token tracking is needed, use the OpenAI-compatible API endpoint (http://<host>:11434/v1) instead of the CLI.
ollama list).