Control LeKiwi robot using Gemini Robotics-ER API or local dimos/Pinocchio IK for visual task understanding and manipulation
Two approaches for LeKiwi visual manipulation:
| Mode | Vision | IK/Motion | Network | Speed |
|---|---|---|---|---|
| Gemini | Gemini-2.5-Flash | Hardcoded/simple | Required | ~2-3s/cycle |
| dimos | OpenCV colors | Pinocchio IK | Offline | ~30fps camera |
cd ~/projects/lerobot
source .venv/bin/activate
# Runs on Pi5 completely offline
python dimos_toy_cleanup.py
Controls:
c = detect and pickup largest toyp = place in detected binh = go homeq = quit# Set API key
export GEMINI_API_KEY="your-key"
# Test connectivity
python ~/.hermes/skills/gemini-robotics/scripts/test_gemini.py
# Run generic task executor
python ~/.hermes/skills/gemini-robotics/scripts/gemini_task_executor.py "pick up the red block and put it in the blue bowl"
# Multi-camera (body + wrist) for better manipulation
python ~/.hermes/skills/gemini-robotics/scripts/gemini_task_executor.py "pick up the screw" --cameras 0 1
# Dry-run mode (no robot movement)
python ~/.hermes/skills/gemini-robotics/scripts/gemini_task_executor.py "pick up the cup" --dry-run
Runs entirely on Pi5 - no API calls, no network, no heavy ML.
Base pip install dimos works on Pi5 BUT requires isolating the IK code:
# This fails (langchain deps in dimos.core):
from dimos.manipulation.planning.kinematics.pinocchio_ik import PinocchioIK
# WORKAROUND: Inline the IK class (~80 lines from dimos)
class PinocchioIK:
def __init__(self, model, data, ee_joint_id): ...
def solve(self, target_pose, q_init): ...
def forward_kinematics(self, q): ...
cd ~/projects/lerobot
source .venv/bin/activate
uv pip install dimos # Installs pinocchio, open3d-unofficial-arm
# Main script location:
python ~/projects/lerobot/dimos_toy_cleanup.py
The pixel→world mapping is currently rough:
# Current (needs tuning):
world_x = 0.15 # Fixed
world_y = (cx - 320) / 320 * 0.1 # Horizontal scale
world_z = 0.1 - (cy - 240) / 240 * 0.1 # Depth guess
# TODO: Add camera matrix + hand-eye calibration
URDF joints for LeKiwi arm:
Pinocchio uses cos/sin encoding for continuous joints (18 values for 9 DOF).
Generic Task Executor - Any instruction works:
Works with any natural language task:
gemini-robotics-er-1.6-preview (latest robotics model)# Using uv (recommended for LeRobot)
uv pip install "google-generativeai>=0.8.3"
# Or with pip
pip install "google-generativeai>=0.8.3" pillow opencv-python
Follows the "Calling a custom robot API" pattern:
User Instruction → Camera Image → Gemini Vision → Object Detection → Action Plan → Robot Execution
class LeKiwiAPI:
def move(x, y, high):
"""Move arm to normalized coordinates 0-1000.
high=True lifts above scene (obstacle avoidance).
high=False places gripper on surface."""
def setGripperState(opened):
"""True=open, False=close"""
def returnToOrigin():
"""Return to home pose"""
Coordinates are normalized [y, x] in 0-1000 range (image center = 500,500).
| File | Purpose | Mode |
|---|---|---|
gemini_task_executor.py | Generic task executor with Gemini API | Gemini |
gemini_pick_place_lerobot.py | Original tutorial example | Gemini |
gemini_toy_cleanup.py | Specific toy cleanup demo | Gemini |
gemini_robot.py | Interactive Gemini control | Gemini |
test_gemini.py | API connectivity test | Gemini |
pick_and_place.py | Standalone pick/place | Gemini |
dimos_toy_cleanup.py | Local CV + Pinocchio IK | dimos |
Use dimos (local) when:
Use Gemini (cloud) when:
"GEMINI_API_KEY not set"
export GEMINI_API_KEY="your-key-here"
"Cannot open camera"
# Find camera index
python -c "import cv2; [print(f'{i}: {cv2.VideoCapture(i).isOpened()}') for i in range(5)]"
# Use different index
python gemini_task_executor.py "task" --camera 2
"google-generativeai not installed"
cd ~/projects/lerobot && uv pip install google-generativeai>=0.8.3
Robot not moving
--dry-run to test plan generation without hardware~/.cache/huggingface/lerobot/calibration/robots/lekiwi/ for calibration file"ModuleNotFoundError: No module named 'langchain_core'"
This happens with from dimos.manipulation.planning.kinematics.pinocchio_ik import PinocchioIK
Solution: Use the inlined PinocchioIK class from dimos_toy_cleanup.py instead:
# Don't use this (pulls in dimos.core → langchain):
from dimos.manipulation.planning.kinematics.pinocchio_ik import PinocchioIK
# Use this (standalone class already in dimos_toy_cleanup.py):
class PinocchioIK:
"""Standalone Pinocchio IK solver (~80 lines from dimos)"""
def __init__(self, model, data, ee_joint_id):
self._model = model
self._data = data
self._ee_joint_id = ee_joint_id
...
"IK converging to NaN"
cos(theta), sin(theta) not raw anglesee_joint_id matches URDF (LeKiwi: joint 6 = gripper)"Robot arm not reaching target"
get_observation() match IK initial guess-angle for some joints"IK solved but robot moved wrong way"
move_to_pose():# Try these sign flips per servo:
action = {
"arm_shoulder_pan": -np.degrees(target_angles[0]), # or +angle
"arm_shoulder_pitch": -np.degrees(target_angles[1]),
"arm_elbow": -np.degrees(target_angles[2]),
"arm_wrist_pitch": -np.degrees(target_angles[3]),
"arm_wrist_roll": -np.degrees(target_angles[4]),
"arm_gripper": -np.degrees(target_angles[5]) * 2,
}
"Camera detection not finding objects"
python -c "
import cv2
import numpy as np
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
hsv = cv2.cvtColor(frame, cv2.COLOR_BGR2HSV)
# Adjust these:
mask = cv2.inRange(hsv, np.array([20, 100, 100]), np.array([35, 255, 255]))
cv2.imshow('mask', mask)
if cv2.waitKey(1) == ord('q'): break
"
"Move command does nothing / servos twitch"
arm_shoulder_pan, not shoulder_panDebug mode - saves camera view:
python gemini_task_executor.py "task" --debug
# Saves to /tmp/lerobot_gemini_view.png
Multiple cameras:
python gemini_task_executor.py "task" --camera 1 # USB camera
Custom integration:
from gemini_task_executor import GeminiVisionClient, LeKiwiAPI
vision = GeminiVisionClient(genai)
objects = vision.locate_objects(image, ["red block", "blue bowl"])
plan = vision.generate_plan(image, instruction, detected_objects)
api = LeKiwiAPI(robot)
execute_plan(api, plan)