Skill ファイル

agent-browser core

Name: agent-browser core
Author: vercel-labs

Core agent-browser usage guide. Read this before running any agent-browser commands. Covers the snapshot-and-ref workflow, navigating pages, interacting with elements (click, fill, type, select), extracting text and data, taking screenshots, managing tabs, handling forms and auth, waiting for content, running multiple browser sessions in parallel, and troubleshooting common failures. Use when the user asks to interact with a website, fill a form, click something, extract data, take a screenshot, log into a site, test a web app, or automate any browser task.

vercel-labs29,695 スター2026/04/16

職業
カテゴリ: デバッグ

スキル内容

Fast browser automation CLI for AI agents. Chrome/Chromium via CDP, no Playwright or Puppeteer dependency. Accessibility-tree snapshots with compact @eN refs let agents interact with pages in ~200-400 tokens instead of parsing raw HTML.

Most normal web tasks (navigate, read, click, fill, extract, screenshot) are covered here. Load a specialized skill when the task falls outside browser web pages — see When to load another skill.

The core loop

agent-browser open <url>        # 1. Open a page
agent-browser snapshot -i       # 2. See what's on it (interactive elements only)
agent-browser click @e3         # 3. Act on refs from the snapshot
agent-browser snapshot -i       # 4. Re-snapshot after any page change

Refs (@e1, @e2, ...) are assigned fresh on every snapshot. They become stale the moment the page changes — after clicks that navigate, form submits, dynamic re-renders, dialog opens. Always re-snapshot before your next ref interaction.

関連 Skill

agent-browser core | Skills Pool

# Install once
npm i -g agent-browser && agent-browser install

# Take a screenshot of a page
agent-browser open https://example.com
agent-browser screenshot home.png
agent-browser close

# Search, click a result, and capture it
agent-browser open https://duckduckgo.com
agent-browser snapshot -i                      # find the search box ref
agent-browser fill @e1 "agent-browser cli"
agent-browser press Enter
agent-browser wait --load networkidle
agent-browser snapshot -i                      # refs now reflect results
agent-browser click @e5                        # click a result
agent-browser screenshot result.png

agent-browser snapshot                    # full tree (verbose)
agent-browser snapshot -i                 # interactive elements only (preferred)
agent-browser snapshot -i -u              # include href urls on links
agent-browser snapshot -i -c              # compact (no empty structural nodes)
agent-browser snapshot -i -d 3            # cap depth at 3 levels
agent-browser snapshot -s "#main"         # scope to a CSS selector
agent-browser snapshot -i --json          # machine-readable output

Page: Example - Log in
URL: https://example.com/login

@e1 [heading] "Log in"
@e2 [form]
  @e3 [input type="email"] placeholder="Email"
  @e4 [input type="password"] placeholder="Password"
  @e5 [button type="submit"] "Continue"
  @e6 [link] "Forgot password?"

agent-browser get text @e1                # visible text of an element
agent-browser get html @e1                # innerHTML
agent-browser get attr @e1 href           # any attribute
agent-browser get value @e1               # input value
agent-browser get title                   # page title
agent-browser get url                     # current URL
agent-browser get count ".item"           # count matching elements

agent-browser click @e1                   # click
agent-browser click @e1 --new-tab         # open link in new tab instead of navigating
agent-browser dblclick @e1                # double-click
agent-browser hover @e1                   # hover
agent-browser focus @e1                   # focus (useful before keyboard input)
agent-browser fill @e2 "hello"            # clear then type
agent-browser type @e2 " world"           # type without clearing
agent-browser press Enter                 # press a key at current focus
agent-browser press Control+a             # key combination
agent-browser check @e3                   # check checkbox
agent-browser uncheck @e3                 # uncheck
agent-browser select @e4 "option-value"   # select dropdown option
agent-browser select @e4 "a" "b"          # select multiple
agent-browser upload @e5 file1.pdf        # upload file(s)
agent-browser scroll down 500             # scroll page (up/down/left/right)
agent-browser scrollintoview @e1          # scroll element into view
agent-browser drag @e1 @e2                # drag and drop

agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find text "Sign In" click --exact     # exact match only
agent-browser find label "Email" fill "[email protected]"
agent-browser find placeholder "Search" type "query"
agent-browser find testid "submit-btn" click
agent-browser find first ".card" click
agent-browser find nth 2 ".card" hover

agent-browser click "#submit"
agent-browser fill "input[name=email]" "[email protected]"
agent-browser click "button.primary"

agent-browser wait @e1                     # until an element appears
agent-browser wait 2000                    # dumb wait, milliseconds (last resort)
agent-browser wait --text "Success"        # until the text appears on the page
agent-browser wait --url "**/dashboard"    # until URL matches pattern (glob)
agent-browser wait --load networkidle      # until network idle (post-navigation)
agent-browser wait --load domcontentloaded # until DOMContentLoaded
agent-browser wait --fn "window.myApp.ready === true"  # until JS condition

agent-browser open https://app.example.com/login
agent-browser snapshot -i

# Pick the email/password refs out of the snapshot, then:
agent-browser fill @e3 "[email protected]"
agent-browser fill @e4 "hunter2"
agent-browser click @e5
agent-browser wait --url "**/dashboard"
agent-browser snapshot -i

agent-browser auth save my-app --url https://app.example.com/login \
  --username [email protected] --password-stdin
# (type password, Ctrl+D)

agent-browser auth login my-app    # fills + clicks, waits for form

# Log in once, save cookies + localStorage
agent-browser state save ./auth.json

# Later runs start already-logged-in
agent-browser --state ./auth.json open https://app.example.com

AGENT_BROWSER_SESSION_NAME=my-app agent-browser open https://app.example.com
# State is auto-saved and restored on subsequent runs with the same name.

# Structured snapshot (best for AI reasoning over page content)
agent-browser snapshot -i --json > page.json

# Targeted extraction with refs
agent-browser snapshot -i
agent-browser get text @e5
agent-browser get attr @e10 href

# Arbitrary shape via JavaScript
cat <<'EOF' | agent-browser eval --stdin
const rows = document.querySelectorAll("table tbody tr");
Array.from(rows).map(r => ({
  name: r.cells[0].innerText,
  price: r.cells[1].innerText,
}));
EOF

agent-browser screenshot                        # temp path, printed on stdout
agent-browser screenshot page.png               # specific path
agent-browser screenshot --full full.png        # full scroll height
agent-browser screenshot --annotate map.png     # numbered labels + legend keyed to snapshot refs

agent-browser tab                      # list open tabs (with stable tabId)
agent-browser tab new https://docs...  # open a new tab (and switch to it)
agent-browser tab 2                    # switch to tab 2
agent-browser tab close 2              # close tab 2

agent-browser --session a open https://app.example.com
agent-browser --session b open https://app.example.com
agent-browser --session a fill @e1 "[email protected]"
agent-browser --session b fill @e1 "[email protected]"

agent-browser network route "**/api/users" --body '{"users":[]}'   # stub a response
agent-browser network route "**/analytics" --abort                 # block entirely
agent-browser network requests                                     # inspect what fired
agent-browser network har start                                    # record all traffic
# ... perform actions ...
agent-browser network har stop /tmp/trace.har

agent-browser record start demo.webm
agent-browser open https://example.com
agent-browser snapshot -i
agent-browser click @e3
agent-browser record stop

agent-browser snapshot -i
# @e3 [Iframe] "payment-frame"
#   @e4 [input] "Card number"
#   @e5 [button] "Pay"

agent-browser fill @e4 "4111111111111111"
agent-browser click @e5

agent-browser frame @e3      # switch context to the iframe
agent-browser snapshot -i
agent-browser frame main     # back to main frame

agent-browser dialog status          # is there a pending dialog?
agent-browser dialog accept           # accept
agent-browser dialog accept "text"    # accept with prompt input
agent-browser dialog dismiss          # cancel

agent-browser doctor                     # full diagnosis (env, Chrome, daemons, config, providers, network, launch test)
agent-browser doctor --offline --quick   # fast, local-only
agent-browser doctor --fix               # also run destructive repairs (reinstall Chrome, purge old state, ...)
agent-browser doctor --json              # structured output for programmatic consumption

agent-browser scroll down 1000
agent-browser snapshot -i
# or
agent-browser wait --text "..."
agent-browser snapshot -i

agent-browser focus @e1
agent-browser keyboard inserttext "text"    # bypasses key events
# or
agent-browser keyboard type "text"          # raw keystrokes, no selector

cat <<'EOF' | agent-browser eval --stdin
// Complex script with quotes, backticks, whatever
document.querySelectorAll('[data-id]').length
EOF

--session <name>        # isolated browser session
--json                  # JSON output (for machine parsing)
--headed                # show the window (default is headless)
--auto-connect          # connect to an already-running Chrome
--cdp <port>            # connect to a specific CDP port
--profile <name|path>   # use a Chrome profile (login state survives)
--headers <json>        # HTTP headers scoped to the URL's origin
--proxy <url>           # proxy server
--state <path>          # load saved auth state from JSON
--session-name <name>   # auto-save/restore session state by name

agent-browser skills get core --full

agent-browser core

The core loop

agent-browser core

The core loop

Quickstart

Reading a page

Interacting

When refs don't work or you don't want to snapshot

Waiting (read this)

Common workflows

Log in

Persist session across runs

Extract data

Screenshot

Handle multiple pages via tabs

Run multiple browsers in parallel

Mock network requests

Record a video of the workflow

Iframes

Dialogs

Diagnosing install issues

Troubleshooting

Global flags worth knowing

When to load another skill

Full reference

Session Logs

OpenClaw Test Heap Leaks

Node Connect

Openclaw Qa Testing

Openclaw Secret Scanning Maintainer

Flags