Name: Agent Device
Author: Shopify

SkillsPool

Skills suchen.../

Agent Device | Skills Pool

mobile_swipe_on_screen

mobile_type_keys

mobile_press_button

mobile_long_press_on_screen_at_coordinates

mobile_*

# Probe: check if a session already owns this device
agent-device appstate --device "iPhone 16"

Outcome	What it means	What to do
Succeeds	Default session already owns this device	Use it — no `open` needed
`Device is already in use by session "X"`	Session `X` owns this device	Use `--session X` for all commands (no `open` needed)
`Session "default" is bound to device "Y"`	Default session owns a different device	Use a new `--session <name>` and proceed with `open`
`No active session` / device not found	No session exists yet	Proceed with `open --device "<name>"` normally

# First: discover available devices
agent-device devices

# Probe for existing session
agent-device appstate --device "iPhone 16"
# If error says "in use by session ios16" → use --session ios16
# If succeeds → default session works, skip open

# Only if no session exists: open with explicit device targeting
agent-device open FlatListPro --device "iPhone 16"

# Subsequent commands — no --device needed (session remembers)
agent-device snapshot -i -c --json   # primary: get elements with rects
agent-device press <x> <y>
agent-device screenshot /tmp/verify.png && sips --resampleHeight 852 /tmp/verify.png >/dev/null  # verification

# Probe for existing session
agent-device appstate --device "Android35" --session droid --platform android
# If succeeds → session already exists, skip open

# Only if no session exists: bind session to Android (replace <package> with actual package name)
agent-device open <package> --session droid --platform android \
  --device "Android35" \
  --activity <package>/.MainActivity

# All subsequent commands: just --session droid
agent-device snapshot -i -c --json --session droid  # primary
agent-device press <x> <y> --session droid
agent-device screenshot /tmp/verify.png --session droid && sips --resampleHeight 852 /tmp/verify.png >/dev/null  # verification

agent-device snapshot -i -c --json

{
  "@ref": "@e25",
  "role": "button",
  "label": "Settings",
  "rect": {"x": 141, "y": 2032, "width": 154, "height": 154}
}

agent-device screenshot /tmp/verify.png && sips --resampleHeight 852 /tmp/verify.png >/dev/null

agent-device screenshot /tmp/screen.png  # (add --session droid for Android)
sips -g pixelWidth -g pixelHeight /tmp/screen.png

# Android: press coords = screenshot pixels
PRESS_W = RAW_W
PRESS_H = RAW_H

# iOS: press coords = screenshot pixels / scale
PRESS_W = RAW_W / 3    # (use /2 for iPhone SE or iPad)
PRESS_H = RAW_H / 3

x = PRESS_W * (x_percent / 100)
y = PRESS_H * (y_percent / 100)

open FlatListPro --device "iPhone 16"    # Launch app (iOS — always specify --device on first open)
open <package> \                          # Launch app (Android — discover package name with `apps --platform android --user-installed`)
  --device "Android35" --session droid --platform android \
  --activity <package>/.MainActivity
close FlatListPro                         # Kill app
back                                  # Navigate back (Android: in-app; iOS: may go to previous app)
home                                  # Device home screen
app-switcher                          # Open app switcher

snapshot -i -c --json                 # Interactive elements with rects (primary method)

press <x> <y>                         # Tap (iOS=logical points, Android=pixels)
press <x> <y> --double-tap            # Double-tap
longpress <x> <y> [durationMs]        # Long press (default 500ms)
type "text"                           # Type into focused field (tap input first)
scroll <up|down|left|right> [0-1]     # Scroll in direction with amount
swipe <x1> <y1> <x2> <y2> [durationMs]  # Precise swipe between coordinates
wait <ms>                             # Wait milliseconds (max 500ms — the app is fast)

screenshot /tmp/screen.png            # Save screenshot
record start ./recording.mov          # Start video recording (iOS only — see below for Android)
record stop                           # Stop recording (iOS only)

agent-device screenshot /tmp/screen.png && sips --resampleHeight 852 /tmp/screen.png >/dev/null

SERIAL=$(adb devices | grep -w device | head -1 | cut -f1)

# Start (run in background)
adb -s $SERIAL shell screenrecord /sdcard/agent-rec.mp4 &

# Stop (SIGINT the on-device process, then pull)
adb -s $SERIAL shell kill -INT $(adb -s $SERIAL shell pidof screenrecord)
sleep 2
adb -s $SERIAL pull /sdcard/agent-rec.mp4 /tmp/recording.mp4
adb -s $SERIAL shell rm -f /sdcard/agent-rec.mp4

devices                               # List available devices
apps --platform ios --user-installed  # List installed apps
appstate                              # Show foreground app/activity (useful on Android)
keyboard dismiss                      # Dismiss on-screen keyboard (Android)
clipboard read                        # Read clipboard (iOS only)
clipboard write "text"                # Write to clipboard

settings appearance dark              # Switch to dark mode
settings appearance light             # Switch to light mode
settings wifi off                     # Toggle wifi
settings permission grant camera      # Grant camera permission

# Step 1: Start recording (separate Bash call)
agent-device record start /tmp/evidence.mov --session ios

# Step 2: Wait + perform action + wait (separate Bash call)
sleep 3 && agent-device swipe 197 340 197 680 800 --session ios && sleep 5

# Step 3: Stop recording (separate Bash call)
agent-device record stop --session ios

# Step 1: Start recording (separate Bash call)
adb -s $SERIAL shell screenrecord /sdcard/agent-rec.mp4 &

# Step 2: Wait + perform action + wait (separate Bash call)
sleep 3 && agent-device swipe 540 700 540 1400 800 --session droid && sleep 5

# Step 3: Stop + pull recording (separate Bash call)
adb -s $SERIAL shell kill -INT $(adb -s $SERIAL shell pidof screenrecord) && sleep 2 && adb -s $SERIAL pull /sdcard/agent-rec.mp4 /tmp/evidence.mp4 && adb -s $SERIAL shell rm -f /sdcard/agent-rec.mp4

# Find which frames are unique (not identical to previous)
prev_hash=""
for f in /tmp/frames/frame-*.png; do
  hash=$(md5 -q "$f")
  if [[ "$hash" != "$prev_hash" ]]; then
    echo "$(basename $f): CHANGED"
    prev_hash="$hash"
  fi
done

# Step 1 (separate Bash call): Start recording
agent-device record start /tmp/loading-evidence.mov

# Step 2 (separate Bash call): Wait for recording to initialize, perform action, wait for completion
sleep 3 && agent-device swipe $X_MID $Y_35PCT $X_MID $Y_75PCT 500 && sleep 5

# Step 3 (separate Bash call): Stop recording
agent-device record stop

# Step 1 (separate Bash call): Start recording
adb -s $SERIAL shell screenrecord /sdcard/agent-rec.mp4 &

# Step 2 (separate Bash call): Wait, perform action, wait
sleep 3 && agent-device swipe $X_MID $Y_35PCT $X_MID $Y_75PCT 500 --session droid && sleep 5

# Step 3 (separate Bash call): Stop + pull
adb -s $SERIAL shell kill -INT $(adb -s $SERIAL shell pidof screenrecord) && sleep 2 && adb -s $SERIAL pull /sdcard/agent-rec.mp4 /tmp/loading-evidence.mp4 && adb -s $SERIAL shell rm -f /sdcard/agent-rec.mp4

# Step 4 (same or separate call): Extract frames + find changes
rm -rf /tmp/loading-frames && mkdir -p /tmp/loading-frames
ffmpeg -y -i /tmp/loading-evidence.mov -vf "fps=30" /tmp/loading-frames/frame-%04d.png 2>/dev/null

# Find changed frames via MD5
prev_hash=""
for f in /tmp/loading-frames/frame-*.png; do
  hash=$(md5 -q "$f")
  if [[ "$hash" != "$prev_hash" ]]; then
    echo "$(basename $f): CHANGED"
    prev_hash="$hash"
  fi
done

# Downsample specific changed frames for LLM viewing
sips --resampleHeight 852 /tmp/loading-frames/frame-0090.png --out /tmp/loading-frames/view-0090.png >/dev/null

Read /tmp/loading-frames/view-0090.png

Scenario	Approach
Navigating / tapping UI elements	`snapshot -i -c --json` + compute center + `press`
Verifying a loading spinner exists	Video + frame extraction
Visual verification after an action	`screenshot` + downsample + `Read`
Element not in accessibility tree	`screenshot` + percentage estimation
Evidence for PR / bug report	Video recording (share .mov file)

Agent Device

IMPORTANT — agent-device is the ONLY tool for device interaction

Agent Device

IMPORTANT — agent-device is the ONLY tool for device interaction

Agent Device Interaction

Prohibited agent-device subcommands

Platform Setup

Device Targeting (Required)

Probing for Existing Sessions (Do This First)

Core Workflow — Snapshot-First

1. Take a snapshot

2. Find the target element

3. Compute the center and press

4. Verify with a screenshot

Fallback: Vision-Based Screenshots

Coordinate System — Device-Agnostic

How press coordinates work

Session Start: Discover Press Dimensions (Vision Fallback Only)

Percentage Method

Command Reference

Navigation

Element Discovery

Interactions

Screenshots & Recording

Android Recording Workaround

Device Info

Settings (useful for testing)

CI Known Issues

Tips

Reducing Round Trips

Skip verification screenshots when confident

Reuse coordinates from a recent snapshot

Capturing Transient States (Loading Indicators, Animations)

Approach

Critical: Recording Timing

Verifying Frame Changes

Example: Capturing a Loading Spinner

When to Use Each Approach

Pull-to-Refresh

Time-Sensitive Scripts

Liquid Glass Design

Compose Multiplatform Patterns

Foundation Models On Device

Swiftui Patterns

Foundation Models On Device

Swiftui Patterns