Control computers with mouse/keyboard — your VM desktop OR the user's personal Mac/PC via remote relay
You can control TWO computers: your own VM desktop AND the user's personal computer (when their relay is connected).
1. Use dispatch-remote-exec.sh for ALL shell commands on the user's computer.
This executes commands DIRECTLY on the user's Mac/PC — no Terminal window needed, no GUI, no screenshots required. The command runs through the relay and returns stdout/stderr.
bash ~/scripts/dispatch-remote-exec.sh "mkdir -p ~/Desktop/Screenshots && mv ~/Desktop/Screenshot*.png ~/Desktop/Screenshot*.jpg ~/Desktop/Screenshots/ 2>/dev/null; echo Done"
That's ONE command. It runs on the USER'S machine, not your VM. Output comes back as JSON with stdout, stderr, and exitCode.
Example: "organize my screenshots into a folder"
Step 1 — See what's on the desktop:
bash ~/scripts/dispatch-remote-exec.sh "ls ~/Desktop/"
Step 2 — Run the command:
bash ~/scripts/dispatch-remote-exec.sh "mkdir -p ~/Desktop/Screenshots && find ~/Desktop -maxdepth 1 -name 'Screenshot*' -type f -exec mv {} ~/Desktop/Screenshots/ \; && ls ~/Desktop/Screenshots/ | wc -l"
Note: macOS screenshot filenames have spaces ("Screenshot 2026-03-27 at 1.16 PM.png"). Use find -exec mv instead of mv Screenshot* to handle spaces correctly.
Step 3 — Verify and report:
bash ~/scripts/dispatch-remote-screenshot.sh
~/scripts/deliver_file.sh ~/.openclaw/workspace/dispatch-remote-screenshot.jpg "Done — here's your desktop now"
That's 3 steps. Under 15 seconds. No Terminal window, no clicking, no GUI.
Common commands via dispatch-remote-exec.sh:
dispatch-remote-exec.sh "mkdir -p ~/Desktop/NewFolder"dispatch-remote-exec.sh "find ~/Desktop -maxdepth 1 -name '*.png' -type f -exec mv {} ~/Desktop/Screenshots/ \;"dispatch-remote-exec.sh "ls -la ~/Desktop/"dispatch-remote-exec.sh "rm ~/Desktop/old-file.txt" (ask user first!)dispatch-remote-exec.sh "mv ~/Desktop/old.txt ~/Desktop/new.txt"dispatch-remote-exec.sh "find ~/Desktop -name '*.png' -type f"dispatch-remote-exec.sh "open -a 'Google Chrome'" (macOS)dispatch-remote-exec.sh "sw_vers; uname -a"NEVER type commands into the user's Terminal via dispatch-remote-type.sh for file operations. The relay's Terminal window captures focus and your commands end up in the wrong window. Always use dispatch-remote-exec.sh instead.
When to use GUI (screenshot/click/type) vs exec:
After EVERY exec command, verify the result before telling the user it's done. Either check the command output (exitCode 0 + expected stdout) or take a screenshot. NEVER claim success without proof.
If the user attached a screenshot of their desktop, DO NOT take another dispatch-remote-screenshot.sh. Use the image they sent — it shows the same thing. Taking redundant screenshots wastes context tokens.
If the user needs to reconnect the relay: Run bash ~/scripts/dispatch-connection-info.sh to get the exact npx command with the real token and IP. Give this to the user — never use placeholder values like YOUR_TOKEN_HERE.
2. Save task state every 5 actions. During multi-step dispatch tasks, write your progress to ~/.openclaw/workspace/ACTIVE_TASK.md every 5 actions so you can resume after context resets. Format:
## Active Task
Request: [what the user asked]
Status: IN_PROGRESS
Completed: [what's done]
Next: [exact next step]
Updated: [timestamp]
3. Batch over single actions. Use dispatch-remote-batch.sh to combine multiple actions into one round-trip. See Batch Command section below.
4. Context budget limit. Remote dispatch tasks must complete in 10 messages or fewer. If you're past 10 messages without completing the task, STOP immediately and:
Max 10 screenshots per task. If you've taken more than 5 screenshots and the task isn't done, you're using GUI when you should be using shell commands. Switch to Terminal immediately.
5. If the first approach fails, go to shell commands. Do NOT try 3 different GUI approaches. If clicking doesn't work on the first try, open Terminal and type a shell command instead. No "let me try another approach" — go straight to the shell fallback.
Your VM has a virtual desktop (Xvfb at DISPLAY=:99, 1280x720, Openbox WM). Use this for:
dispatch-browser.sh)Scripts: dispatch-screenshot.sh, dispatch-click.sh, dispatch-type.sh, dispatch-press.sh, dispatch-scroll.sh, dispatch-browser.sh
When the user runs instaclaw-dispatch on their Mac/PC, you can control their actual computer. Use this for:
Scripts: dispatch-remote-screenshot.sh, dispatch-remote-click.sh, dispatch-remote-type.sh, dispatch-remote-press.sh, dispatch-remote-scroll.sh
| User says... | Mode | Why |
|---|---|---|
| "open dexscreener" / "show me this site" | Local (dispatch-browser.sh) | You browse on your VM |
| "do this on my computer" / "on my screen" | Remote (dispatch-remote-*) | User's machine |
| "open Figma and edit the logo" | Remote | Figma is on user's Mac |
| "take a screenshot of your desktop" | Local (dispatch-screenshot.sh) | Your VM screen |
| "take a screenshot of my screen" | Remote (dispatch-remote-screenshot.sh) | User's screen |
| "click on this button" (in VM browser) | Local (dispatch-click.sh) | Your VM |
| Regular web browsing/scraping | Local browser tool or dispatch-browser.sh | No need for user's machine |
Default: Use Local dispatch unless the user explicitly asks you to act on THEIR computer.
Before using remote dispatch, check if the user's relay is connected:
bash ~/scripts/dispatch-remote-status.sh
Returns {"connected":true} or {"connected":false}. If not connected, tell the user:
"To let me control your computer, run npx @instaclaw/dispatch in your terminal."
bash ~/scripts/dispatch-browser.sh "https://example.com"
sleep 5
bash ~/scripts/dispatch-screenshot.sh
~/scripts/deliver_file.sh ~/.openclaw/workspace/dispatch-screenshot.jpg "Screenshot"
Has anti-Cloudflare stealth. Use for ANY website visit.
bash ~/scripts/dispatch-screenshot.sh
Returns JSON with path, coordMap, image_base64. Send to user via deliver_file.sh.
bash ~/scripts/dispatch-click.sh <x> <y>
bash ~/scripts/dispatch-type.sh "text"
bash ~/scripts/dispatch-press.sh "Return"
bash ~/scripts/dispatch-scroll.sh down 3
DISPLAY=:99 xterm &
bash ~/scripts/dispatch-remote-screenshot.sh
Captures the user's actual screen. Returns JSON with path (saved to workspace) and coordMap. Send to user via deliver_file.sh.
bash ~/scripts/dispatch-remote-click.sh <x> <y>
bash ~/scripts/dispatch-remote-type.sh "text"
bash ~/scripts/dispatch-remote-press.sh "Return"
bash ~/scripts/dispatch-remote-scroll.sh down 3
bash ~/scripts/dispatch-remote-drag.sh <fromX> <fromY> <toX> <toY>
bash ~/scripts/dispatch-remote-windows.sh
Use batch commands to execute multiple actions per reasoning cycle. This is 2-3x faster than single actions.
bash ~/scripts/dispatch-remote-batch.sh '{"actions":[{"type":"click","params":{"x":400,"y":300},"waitAfterMs":100},{"type":"type","params":{"text":"hello world"},"waitAfterMs":0},{"type":"press","params":{"key":"Return"},"waitAfterMs":1500}]}'
Returns JSON with both action results AND a screenshot (auto-captured after the batch). The screenshot is saved to ~/.openclaw/workspace/dispatch-remote-screenshot.jpg.
screenshotAfter: true (default) — auto-screenshot after batchscreenshotFormat: "webp" (default) — smaller than JPEGscreenshotQuality: 55 (default) — good enough for GUI analysissettleMs: 300 (default) — wait for screen to settle before screenshotwaitAfterMs per action: milliseconds to wait after each action (default 50ms)| Action | waitAfterMs | Why |
|---|---|---|
| Click on UI element | 100 | OS redraws instantly |
| Type text | 0 | Characters appear immediately |
| Press Enter on form/search | 1500-3000 | Page navigation or API call |
| Click link / navigate | 2000-3000 | Page load |
| Scroll | 200 | Smooth scroll animation |
| Click dropdown/menu | 300 | Animation |
If you need precise control or are unsure of the screen state, use individual commands:
Max 50 actions per task. Max 20 actions per batch.
Not every action needs a verification screenshot. Use this decision tree:
If you can predict what the screen looks like after the action, skip the screenshot. A search flow (click search bar → type query → press Enter) needs ONE screenshot at the end, not three.
Each screenshot costs 1,049 vision tokens ($0.003). A 20-step task with screenshots after every action: ~$0.12. With smart verification: ~$0.04-0.06. Prefer batching to cut costs by 50-70%.
Before executing any dispatch command, check if the user has taken control:
[ -f ~/.openclaw/workspace/.user-takeover ] && echo "USER_IN_CONTROL" || echo "OK"
If .user-takeover exists, STOP all dispatch actions immediately. The user is controlling the desktop via live view. Wait and check again in 10 seconds. When the file is removed, resume your work.
Never fight the user for control. If the takeover file exists, do not click, type, press, scroll, or take screenshots.
If a dispatch command returns an error containing "rate limit": Tell the user: "I'm being rate limited on dispatch commands. I'll wait 30 seconds and try again." Then wait 30 seconds before retrying.
Before EVERY remote dispatch command: Check relay status first:
bash ~/scripts/dispatch-remote-status.sh
If connected: false, tell the user: "Your dispatch relay isn't connected. Run npx @instaclaw/dispatch in your terminal to connect."
After taking a screenshot (local OR remote), ALWAYS send it to the user:
# Local screenshot:
bash ~/scripts/dispatch-screenshot.sh
~/scripts/deliver_file.sh ~/.openclaw/workspace/dispatch-screenshot.jpg "Desktop screenshot"
# Remote screenshot:
bash ~/scripts/dispatch-remote-screenshot.sh
~/scripts/deliver_file.sh ~/.openclaw/workspace/dispatch-remote-screenshot.jpg "Your Mac screenshot"
Each dispatch screenshot costs 1,049 vision tokens ($0.003 at Sonnet pricing). A 20-step task costs ~$0.06-0.30. Be efficient:
| Error | Fix |
|---|---|
| "dispatch relay not connected" | User needs to run npx @instaclaw/dispatch |
| Screenshot fails (local) | Check Xvfb: ps aux | grep Xvfb |
| Screenshot fails (remote) | User may need to grant Screen Recording permission |
| Click doesn't work | Verify coordinates from latest screenshot |
| dispatch-browser.sh won't launch | Check RAM: free -m (needs 500MB+ available) |