Name: Add Image Vision
Author: qwibitai

Image Vision Skill

Adds the ability for NanoClaw agents to see and understand images sent via WhatsApp. Images are downloaded, resized with sharp, saved to the group workspace, and passed to the agent as base64-encoded multimodal content blocks.

Phase 1: Pre-flight

Check if src/image.ts exists — skip to Phase 3 if already applied
Confirm sharp is installable (native bindings require build tools)

Prerequisite: WhatsApp must be installed first (skill/whatsapp merged). This skill modifies WhatsApp channel files.

Phase 2: Apply Code Changes

Ensure WhatsApp fork remote

git remote -v

If whatsapp is missing, add it:

Add Image Vision

Image Vision Skill

Phase 1: Pre-flight

Phase 2: Apply Code Changes

Ensure WhatsApp fork remote

Add Image Vision

Image Vision Skill

Phase 1: Pre-flight

Phase 2: Apply Code Changes

Ensure WhatsApp fork remote

Merge the skill branch

Validate code changes

Phase 3: Configure

Phase 4: Verify

Troubleshooting

Songsee

Video Frames

Gifgrep

Qqbot Media

Camsnap

Openai Whisper Api