Controls the SunFounder PiDog robot via natural language using the Command PiDog REST API. Use when asked to make PiDog move, perform actions, react emotionally, look at the camera, read sensors, change LED colors, play sounds, navigate, greet people, do tricks, describe what it sees, or respond to touch and voice. Supports all 30 actions, vision/camera AI analysis, real-time sensor data, RGB LEDs, and WebSocket streaming.
Control a SunFounder PiDog robotic dog using natural language. This skill bridges voice/text commands to the full Command PiDog REST API running on a Raspberry Pi.
Use this skill when a user asks to:
uvicorn app.main:app --host 0.0.0.0 --port 8000http://<pi-hostname>:8000/api/v1POST /camera/start) and a vision-capable model configuredapi/.env):
PIDOG_OLLAMA_URL=http://localhost:11434
PIDOG_OLLAMA_MODEL=llama3.2:3b
PIDOG_OLLAMA_VISION_MODEL=llava:7b
PIDOG_OPENROUTER_API_KEY=sk-or-...
PIDOG_OPENROUTER_VISION_MODEL=meta-llama/llama-3.2-11b-vision-instruct
PIDOG_CAMERA_ENABLED=true
POST /api/v1/actions/execute
Content-Type: application/json
{"actions": ["sit", "handshake"], "speed": 80}
All 30 actions (speed 0–100):
| Category | Actions |
|---|---|
| Movement | forward, backward, turn left, turn right, stop |
| Postures | stand, sit, lie |
| Expressions | bark, bark harder, pant, howling, wag tail, shake head, nod, think, recall, fluster, surprise |
| Social | handshake, high five, lick hand, scratch |
| Physical | stretch, push up, twist body, relax neck |
| Idle | doze off, waiting, feet shake |
Posture dependencies (handled automatically):
doze off requires lying first → send ["lie", "doze off"]handshake, high five, lick hand, scratch, nod, relax neck, feet shake → need sittingpush up, twist body, forward, backward, turn left, turn right → need standingGET /api/v1/sensors/all
Returns: distance (cm), imu (pitch/roll °), touch (N/L/R/LS/RS), sound (direction °, detected bool)
Individual endpoints: /sensors/distance, /sensors/imu, /sensors/touch, /sensors/sound
Touch sensor states:
N — no touchR — front touched (PiDog likes this → wag tail, pant)L — rear touchedRS — slide front→rear (PiDog loves this → ecstatic reaction)LS — slide rear→front (PiDog dislikes → shake head, back away)Start the camera first (idempotent):
POST /api/v1/camera/start
Then ask a vision question:
POST /api/v1/agent/vision
Content-Type: application/json
{
"question": "Who is in front of me? Are they waving?",
"provider": "openrouter",
"model": "meta-llama/llama-3.2-11b-vision-instruct"
}
Returns: description, answer, actions[]
The vision endpoint:
Vision use cases:
POST /api/v1/agent/chat
Content-Type: application/json
{"message": "Do a trick to impress me!", "provider": "ollama"}
The LLM receives the PiDog skill document + live sensor context and responds with a JSON action plan that is automatically executed.
POST /api/v1/rgb/mode
Content-Type: application/json
{"style": "breath", "color": "cyan", "bps": 1.0, "brightness": 0.8}
| Style | Effect |
|---|---|
monochromatic | Solid color |
breath | Slowly pulses in and out |
boom | Explodes from center outward |
bark | Radiates from center (alarm) |
speak | Oscillates center↔edges (talking) |
listen | Sweeps left→right (listening) |
Colors: white, black, red, yellow, green, blue, cyan, magenta, pink, or hex #rrggbb
POST /api/v1/servos/head
{"yaw": 45, "roll": 0, "pitch": -20, "speed": 60}
Ranges: yaw ±90°, roll ±70°, pitch -45° to +30°
POST /api/v1/servos/tail
{"angle": 60, "speed": 50}
Tail range: ±90°
GET /api/v1/sound/list # List all available sounds
POST /api/v1/sound/play
{"name": "single_bark_1", "volume": 80}
const ws = new WebSocket("ws://<pi-hostname>:8000/api/v1/ws");
ws.send(JSON.stringify({
"type": "subscribe",
"channels": ["sensors", "action_status", "status", "logs"]
}));
| Channel | Rate | Data |
|---|---|---|
sensors | 5 Hz | distance, IMU, touch, sound |
status | 0.2 Hz | battery, posture, uptime |
action_status | on change | current action queue state |
logs | as emitted | server log stream |
Poll GET /sensors/distance in a loop. When distance < 20 cm:
POST /actions/execute → ["stop", "backward"]POST /rgb/mode → {"style": "bark", "color": "red", "bps": 3}Poll GET /sensors/sound. When sound detected:
POST /servos/head with yaw set to (direction - 180) / 2 to face sound sourcebark if direction persistsPoll GET /sensors/imu. When pitch or roll exceeds ±30°:
["surprise"] action + yellow boom LEDsPoll GET /sensors/touch and react in real time:
R or RS → ["wag tail", "pant"] + pink breath LEDsLS → ["shake head", "backward"] + red monochromaticL → ["scratch"]POST /camera/startforward × N, turn right × 4)POST /agent/vision with question "Is anyone here who shouldn't be?"Chain actions for a full crowd-pleasing routine:
{"actions": ["stand", "stretch", "sit", "handshake"]}
Then follow with: high five → push up → howling → wag tail
Read sensor state and pick an emotion:
doze off + magenta breathsurprise + red boompant + pink breathwaiting → random one of [scratch, relax neck, feet shake]Start the MJPEG stream at /camera/stream and every 30 seconds call /agent/vision with:
"Briefly describe this scene in one sentence, then suggest a fun PiDog reaction."
Use the answer as a caption overlay or TTS narration.
| Problem | Solution |
|---|---|
503 on /agent/vision | Start camera first: POST /camera/start |
502 on /agent/vision | Vision model not running — use provider=openrouter or run ollama pull llava:7b |
| 422 on action execute | Invalid action name — check /actions for valid list |
| 422 "battery low" | Battery below 6.5V — charge before heavy movement |
| 429 rate limit | Max 10 actions/second — add delays between rapid commands |
| Actions not chaining | Include posture setup in same request: ["sit", "handshake"] |
| WebSocket disconnects | Reconnect and re-send subscribe message |