技能檔案

Agent Device

Name: Agent Device
Author: GLaDO8

Automates interactions for iOS simulators/devices and Android emulators/devices. Use when navigating apps, taking snapshots/screenshots, tapping, typing, scrolling, or extracting UI info on mobile targets.

GLaDO80 星標2026年4月7日

職業
分類: 流動裝置

技能內容

Mobile Automation with agent-device

For agent-driven exploration: use refs. For deterministic replay scripts: use selectors.

Quick start

agent-device open Settings --platform ios
agent-device snapshot -i
agent-device press @e3
agent-device wait text "Camera"
agent-device alert wait 10000
agent-device diff snapshot -i
agent-device fill @e5 "test"
agent-device close

If not installed, run:

npx -y agent-device

Core workflow

Open app or deep link: open [app|url] [url] (open handles target selection + boot/activation in the normal flow)
Snapshot: snapshot to get refs from accessibility tree

相關技能

Agent Device | Skills Pool

agent-device boot                 # Ensure target is booted/ready without opening app
agent-device boot --platform ios  # Boot iOS target
agent-device boot --platform android # Boot Android emulator/device target
agent-device open [app|url] [url] # Boot device/simulator; optionally launch app or deep link URL
agent-device open [app] --relaunch # Terminate app process first, then launch (fresh runtime)
agent-device open [app] --activity com.example/.MainActivity # Android: open specific activity (app targets only)
agent-device open "myapp://home" --platform android          # Android deep link
agent-device open "https://example.com" --platform ios       # iOS deep link (opens in browser)
agent-device open MyApp "myapp://screen/to" --platform ios   # iOS deep link in app context
agent-device close [app]          # Close app or just end session
agent-device reinstall <app> <path> # Uninstall + install app in one command
agent-device session list         # List active sessions

agent-device snapshot                  # Full XCTest accessibility tree snapshot
agent-device snapshot -i               # Interactive elements only (recommended)
agent-device snapshot -c               # Compact output
agent-device snapshot -d 3             # Limit depth
agent-device snapshot -s "Camera"      # Scope to label/identifier
agent-device snapshot --raw            # Raw node output
agent-device diff snapshot             # Structural diff against previous session baseline

agent-device find "Sign In" click
agent-device find text "Sign In" click
agent-device find label "Email" fill "[email protected]"
agent-device find value "Search" type "query"
agent-device find role button click
agent-device find id "com.example:id/login" click
agent-device find "Settings" wait 10000
agent-device find "Settings" exists

agent-device settings wifi on
agent-device settings wifi off
agent-device settings airplane on
agent-device settings airplane off
agent-device settings location on
agent-device settings location off
agent-device settings faceid match
agent-device settings faceid nonmatch
agent-device settings faceid enroll
agent-device settings faceid unenroll

agent-device logs doctor
agent-device logs start
agent-device logs path

agent-device appstate

agent-device press @e1                # Canonical tap command (`click` is an alias)
agent-device focus @e2
agent-device fill @e2 "text"           # Clear then type (Android: verifies value and retries once on mismatch)
agent-device type "text"               # Type into focused field without clearing
agent-device press 300 500             # Tap by coordinates
agent-device press 300 500 --count 12 --interval-ms 45
agent-device press 300 500 --count 6 --hold-ms 120 --interval-ms 30 --jitter-px 2
agent-device press @e1 --count 5             # Repeat taps on the same target
agent-device press @e1 --count 5 --double-tap # Use double-tap gesture per iteration
agent-device swipe 540 1500 540 500 120
agent-device swipe 540 1500 540 500 120 --count 8 --pause-ms 30 --pattern ping-pong
agent-device longpress 300 500 800     # Long press on iOS and Android
agent-device scroll down 0.5
agent-device pinch 2.0              # Zoom in 2x (iOS simulator only)
agent-device pinch 0.5 200 400     # Zoom out at coordinates (iOS simulator only)
agent-device back
agent-device home
agent-device app-switcher
agent-device wait 1000
agent-device wait text "Settings"
agent-device is visible 'id="settings_anchor"'  # selector assertions for deterministic checks
agent-device is text 'id="header_title"' "Settings"
agent-device alert get

agent-device get text @e1
agent-device get attrs @e1
agent-device screenshot out.png

agent-device open App --relaunch      # Fresh app process restart in the current session
agent-device open App --save-script   # Save session script (.ad) on close (default path)
agent-device open App --save-script ./workflows/app-flow.ad  # Save to custom file path
agent-device replay ./session.ad      # Run deterministic replay from .ad script
agent-device replay -u ./session.ad   # Update selector drift and rewrite .ad script in place

agent-device batch \
  --session sim \
  --platform ios \
  --udid 00008150-001849640CF8401C \
  --steps-file /tmp/batch-steps.json \
  --json

agent-device batch --steps '[{"command":"open","positionals":["settings"]},{"command":"wait","positionals":["100"]}]'

[
  { "command": "open", "positionals": ["settings"], "flags": {} },
  { "command": "wait", "positionals": ["label=\"Privacy & Security\"", "3000"], "flags": {} },
  { "command": "click", "positionals": ["label=\"Privacy & Security\""], "flags": {} },
  { "command": "get", "positionals": ["text", "label=\"Tracking\""], "flags": {} }
]

agent-device trace start               # Start trace capture
agent-device trace start ./trace.log   # Start trace capture to path
agent-device trace stop                # Stop trace capture
agent-device trace stop ./trace.log    # Stop and move trace log

agent-device devices
agent-device apps --platform ios              # iOS simulator + iOS device, includes default/system apps
agent-device apps --platform ios --all        # explicit include-all (same as default)
agent-device apps --platform ios --user-installed
agent-device apps --platform android          # includes default/system apps
agent-device apps --platform android --all    # explicit include-all (same as default)
agent-device apps --platform android --user-installed

press is the canonical tap command; click is an alias with the same behavior.
press (and click) accepts x y, @ref, and selector targets.
press/click support gesture series controls: --count, --interval-ms, --hold-ms, --jitter-px, --double-tap.
--double-tap cannot be combined with --hold-ms or --jitter-px.
swipe supports coordinate + timing controls and repeat patterns: swipe x1 y1 x2 y2 [durationMs] --count --pause-ms --pattern.
swipe timing is platform-safe: Android uses requested duration; iOS uses normalized safe timing to avoid longpress side effects.
longpress is coordinate-based and supported on iOS and Android.
Pinch (pinch <scale> [x y]) is iOS simulator-only; scale > 1 zooms in, < 1 zooms out.
Snapshot refs are the core mechanism for interactive agent flows.
Use selectors for deterministic replay artifacts and assertions (e.g. in e2e test workflows).
Prefer snapshot -i to reduce output size.
Prefer scoped snapshots (-s "<label>" or -s @ref) for screen-local tasks.
Add -d <depth> when only upper tree levels matter; avoid full-tree snapshots by default.
Use diff snapshot after mutations to detect structural changes with less output than full re-read.
Refresh refs immediately after navigation/modal/list mutations before issuing next ref-targeted action.
Use --raw only for debugging parser/tree edge-cases; avoid it for normal agent loops due to size.
On iOS, snapshots use XCTest and do not require Accessibility permission.
If XCTest returns 0 nodes (foreground app changed), treat it as an explicit failure and retry the flow/app state.
open <app|url> [url] can be used within an existing session to switch apps or open deep links.
open <app> updates session app bundle context; open <app> <url> opens a deep link on iOS.
Use open <app> --relaunch during React Native/Fast Refresh debugging when you need a fresh app process without ending the session.
Use --session <name> for parallel sessions; avoid device contention.
Use --activity <component> on Android to launch a specific activity (e.g. TV apps with LEANBACK); do not combine with URL opens.
On iOS devices, http(s):// URLs fall back to Safari automatically; custom scheme URLs require an active app in the session.
iOS physical-device runner requires Xcode signing/provisioning; optional overrides: AGENT_DEVICE_IOS_TEAM_ID, AGENT_DEVICE_IOS_SIGNING_IDENTITY, AGENT_DEVICE_IOS_PROVISIONING_PROFILE.
Default daemon request timeout is 45000ms. For slow physical-device setup/build, increase AGENT_DEVICE_DAEMON_TIMEOUT_MS (for example 120000).
For daemon startup troubleshooting, follow stale metadata hints for ~/.agent-device/daemon.json / ~/.agent-device/daemon.lock.
Use fill when you want clear-then-type semantics.
Use type when you want to append/enter text without clearing.
On Android, prefer fill for important fields; it verifies entered text and retries once when IME reorders characters.
If using deterministic replay scripts, use replay -u during maintenance runs to update selector drift in replay scripts. Use plain replay in CI.

Agent Device

Mobile Automation with agent-device

Quick start

Core workflow

Agent Device

Mobile Automation with agent-device

Quick start

Core workflow

Commands

Navigation

Snapshot (page analysis)

Find (semantic)

Settings helpers

Logs (token-efficient debugging)

App state

Interactions (use @refs from snapshot)

Get information

Deterministic replay and updating

Fast batching (JSON steps)

Trace logs (XCTest)

Devices and apps

Best practices

References

Liquid Glass Design

Compose Multiplatform Patterns

Foundation Models On Device

Swiftui Patterns

Foundation Models On Device

Swiftui Patterns