Name: See
Author: qte77

Capture the screen, run a local vision-language model (Qwen2.5-VL by default via llama-cpp-python), and return a short text description for injection into Claude's context. Token-efficient: ~120 tokens per call vs ~1,600 for sending the raw image to Claude's vision API. No external daemon — model runs in-process via llama-cpp-python.

Install — three steps

# 1. Core scaffolding deps (mss, Pillow, blake3)
make setup_see

# 2. llama-cpp-python (pick ONE matching your hardware)
#    See `make setup_see` output for the exact commands.
#    Examples:
uv pip install 'llama-cpp-python' \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
# or CUDA 12.4:
# uv pip install 'llama-cpp-python' --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
# or Metal:
# CMAKE_ARGS='-DLLAMA_METAL=on' uv pip install llama-cpp-python

# 3. Download the Qwen2.5-VL GGUF + mmproj files
mkdir -p ~/.cache/cc-voice/models
cd ~/.cache/cc-voice/models
wget https://huggingface.co/bartowski/Qwen2.5-VL-3B-Instruct-GGUF/resolve/main/Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf
wget https://huggingface.co/bartowski/Qwen2.5-VL-3B-Instruct-GGUF/resolve/main/mmproj-Qwen2.5-VL-3B-Instruct-f16.gguf

Install — three steps

# 1. Core scaffolding deps (mss, Pillow, blake3)
make setup_see

# 2. llama-cpp-python (pick ONE matching your hardware)
#    See `make setup_see` output for the exact commands.
#    Examples:
uv pip install 'llama-cpp-python' \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu
# or CUDA 12.4:
# uv pip install 'llama-cpp-python' --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cu124
# or Metal:
# CMAKE_ARGS='-DLLAMA_METAL=on' uv pip install llama-cpp-python

# 3. Download the Qwen2.5-VL GGUF + mmproj files
mkdir -p ~/.cache/cc-voice/models
cd ~/.cache/cc-voice/models
wget https://huggingface.co/bartowski/Qwen2.5-VL-3B-Instruct-GGUF/resolve/main/Qwen2.5-VL-3B-Instruct-Q4_K_M.gguf
wget https://huggingface.co/bartowski/Qwen2.5-VL-3B-Instruct-GGUF/resolve/main/mmproj-Qwen2.5-VL-3B-Instruct-f16.gguf

Path	Tokens per call	Notes
`/see` (in-process VLM → text)	~120	Default. No daemon, no HTTP round-trip.
`/see` cache hit (unchanged screen)	0	Frame hashed via BLAKE3; same image+template = no VLM call
Sending raw image to Claude vision (Tier 1, deferred)	~1,600	Opt-in via future `--vision` flag

Artifact	Removal
Per-session cache (in-memory `DescribeCache`)	Exits with the Python process. Nothing to clean.
Temp JPEGs (`/tmp/tmp*.jpg`)	`make clean_see_artifacts`
Downloaded GGUF + mmproj (`~/.cache/cc-voice/models/`)	`make clean_models`
Python venv + pytest/ruff caches	`make clean`
All of the above at once	`make clean_all`
Plugin installation (if done via `make plugin_install_local`)	`make plugin_uninstall`
`llama-cpp-python` wheel	`uv pip uninstall llama-cpp-python` (manual since it's not in `[see]` extras)

See

Install — three steps

See

Install — three steps

Usage

Local testing without Claude Code

Implementation

Configuration

Token budget

Why llama-cpp-python and not Ollama

Removing changes made by `/see`

Status

Oracle

Blucli

Peekaboo

Add Dock Band

Add Fallback Commands

Add Adaptive Card Form

See

Install — three steps

See

Install — three steps

Usage

Local testing without Claude Code

Implementation

Configuration

Token budget

Why llama-cpp-python and not Ollama

Removing changes made by /see

Status

Oracle

Blucli

Peekaboo

Add Dock Band

Add Fallback Commands

Add Adaptive Card Form

Removing changes made by `/see`