Use when you need to visually interact with a GUI — test buttons, fill forms, verify visual layouts, fuzz web pages, automate user flows, take screenshots, or perform end-to-end QA on any application. Works on cloud VMs, Docker containers, local machines, and sandboxes. Install: pip install cua.
CUA gives you eyes and hands on a real computer: see the screen, move the mouse, click, type, drag, and manage windows — like a human at the keyboard.
Use this skill for visual interaction that can't be done via shell or API.
cua --version # check install; if missing: pip install cua
# Connect to target (pick one)
cua do switch cloud my-vm
cua do switch docker my-container
cua do-host-consent && cua do switch host # local machine (one-time consent)
ANTHROPIC_API_KEYis optional. With it,cua do snapshotreturns an AI-annotated screen with element coordinates. Without it, usescreenshotand read the image yourself.
Look → Act → Verify — repeat until done, then share:
cua do screenshot # look
cua do click 450 280 # act
cua do screenshot # verify
cua trajectory share # share replay link with user
Re-screenshot after every UI change — coordinates go stale when the screen changes.
cua do screenshot
cua do click 450 280
cua do screenshot
cua do screenshot
cua do click 400 200 && cua do type "Jane Doe"
cua do key tab && cua do type "[email protected]"
cua do key tab && cua do type "SecureP@ss123"
cua do click 400 500
cua do screenshot
cua do click 350 400 # "Choose File"
cua do type "/home/user/report.pdf"
cua do key enter
cua do screenshot
When clicking small or dense UI elements — especially on the host machine — zoom into the target window first. Coordinates become window-relative and screenshots show only that window, giving you higher effective resolution.
cua do zoom "Google Chrome" # crop to Chrome window; coords are now window-relative
cua do screenshot # zoomed view — easier to locate small elements
cua do click 112 44 # precise click on a small tab or button
cua do screenshot # verify
cua do unzoom # restore full-screen coords when done
cua do screenshot # back to full desktop view
Use
zoomany time click accuracy is uncertain.unzoombefore switching windows or when you need to see the full desktop again.
cua do window ls # list open windows
cua do drag 150 300 650 400 # source → destination
cua do screenshot
cua do screenshot
cua do click 400 200
cua do type "<script>alert(1)</script>"
cua do key tab && cua do type "'; DROP TABLE users; --"
cua do key tab && cua do type "AAAAAAAAAAAAAAAAAAAAAAA"
cua do click 400 500
cua do screenshot # check for errors, crashes, unexpected behavior
Every action is auto-recorded to ~/.cua/trajectories/{machine}/{session}/.
cua trajectory share # upload and get shareable HTTPS link (always do this at end)
cua trajectory ls # list sessions
cua trajectory export # generate HTML report
cua do --no-record click 100 200 # disable recording for a single action
Tell the user: "Here is the trajectory of my session: {url}"
| Action | Command |
|---|---|
| Connect to target | cua do switch <provider> [name] |
| Screenshot | cua do screenshot |
| AI-annotated screen | cua do snapshot ["instructions"] |
| Click | cua do click <x> <y> [left|right|middle] |
| Double-click | cua do dclick <x> <y> |
| Type text | cua do type "text" |
| Press key | cua do key <key> |
| Hotkey | cua do hotkey <combo> (e.g. ctrl+c) |
| Scroll | cua do scroll <direction> [amount] |
| Drag | cua do drag <x1> <y1> <x2> <y2> |
| Move cursor | cua do move <x> <y> |
| Shell command | cua do shell "command" |
| Open URL/file | cua do open <url|path> |
| List windows | cua do window ls [app] |
| Focus window | cua do window focus <id> |
| Zoom to window | cua do zoom "App Name" |
| Unzoom | cua do unzoom |
| Share trajectory | cua trajectory share |
| Provider | Example |
|---|---|
cloud | cua do switch cloud my-vm |
cloudv2 | cua do switch cloudv2 my-vm |
docker | cua do switch docker my-container |
lume | cua do switch lume my-vm |
lumier | cua do switch lumier my-vm |
winsandbox | cua do switch winsandbox |
host | cua do switch host |
See references/command-reference.md for full argument syntax.