Browser automation CLI for AI agents. Use when the user needs to interact with websites, including navigating pages, filling forms, clicking buttons, taking screenshots, extracting data, testing web apps, or automating any browser task. Triggers include requests to "open a website", "fill out a form", "click a button", "take a screenshot", "scrape data from a page", "test this web app", "login to a site", "automate browser actions", or any task requiring programmatic web interaction.
Every browser automation follows this pattern:
agent-browser open <url>agent-browser snapshot -i (get element refs like @e1, @e2)```bash agent-browser open https://example.com/form agent-browser snapshot -i
agent-browser fill @e1 "[email protected]" agent-browser fill @e2 "password123" agent-browser click @e3 agent-browser wait --load networkidle agent-browser snapshot -i # Check result ```
```bash
agent-browser open <url> # Navigate (aliases: goto, navigate) agent-browser close # Close browser
agent-browser snapshot -i # Interactive elements with refs (recommended) agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer) agent-browser snapshot -s "#selector" # Scope to CSS selector
agent-browser click @e1 # Click element agent-browser fill @e2 "text" # Clear and type text agent-browser type @e2 "text" # Type without clearing agent-browser select @e1 "option" # Select dropdown option agent-browser check @e1 # Check checkbox agent-browser press Enter # Press key agent-browser scroll down 500 # Scroll page
agent-browser get text @e1 # Get element text agent-browser get url # Get current URL agent-browser get title # Get page title
agent-browser wait @e1 # Wait for element agent-browser wait --load networkidle # Wait for network idle agent-browser wait --url "**/page" # Wait for URL pattern agent-browser wait 2000 # Wait milliseconds
agent-browser screenshot # Screenshot to temp dir agent-browser screenshot --full # Full page screenshot agent-browser pdf output.pdf # Save as PDF ```
```bash agent-browser open https://example.com/signup agent-browser snapshot -i agent-browser fill @e1 "Jane Doe" agent-browser fill @e2 "[email protected]" agent-browser select @e3 "California" agent-browser check @e4 agent-browser click @e5 agent-browser wait --load networkidle ```
```bash
agent-browser open https://app.example.com/login agent-browser snapshot -i agent-browser fill @e1 "$USERNAME" agent-browser fill @e2 "$PASSWORD" agent-browser click @e3 agent-browser wait --url "**/dashboard" agent-browser state save auth.json
agent-browser state load auth.json agent-browser open https://app.example.com/dashboard ```
```bash agent-browser open https://example.com/products agent-browser snapshot -i agent-browser get text @e5 # Get specific element text agent-browser get text body > page.txt # Get all page text
agent-browser snapshot -i --json agent-browser get text @e1 --json ```
```bash agent-browser --session site1 open https://site-a.com agent-browser --session site2 open https://site-b.com
agent-browser --session site1 snapshot -i agent-browser --session site2 snapshot -i
agent-browser session list ```
```bash agent-browser --headed open https://example.com agent-browser highlight @e1 # Highlight element agent-browser record start demo.webm # Record session ```
```bash
agent-browser --allow-file-access open file:///path/to/document.pdf agent-browser --allow-file-access open file:///path/to/page.html agent-browser screenshot output.png ```
```bash
agent-browser device list
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
agent-browser -p ios snapshot -i agent-browser -p ios tap @e1 # Tap (alias for click) agent-browser -p ios fill @e2 "text" agent-browser -p ios swipe up # Mobile-specific gesture
agent-browser -p ios screenshot mobile.png
agent-browser -p ios close ```
Requirements: macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)
Real devices: Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.
Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
```bash agent-browser click @e5 # Navigates to new page agent-browser snapshot -i # MUST re-snapshot agent-browser click @e1 # Use new refs ```
When refs are unavailable or unreliable, use semantic locators:
```bash agent-browser find text "Sign In" click agent-browser find label "Email" fill "[email protected]" agent-browser find role button click --name "Submit" agent-browser find placeholder "Search" type "query" agent-browser find testid "submit-btn" click ```
| Reference | When to Use |
|---|---|
| references/commands.md | Full command reference with all options |
| references/snapshot-refs.md | Ref lifecycle, invalidation rules, troubleshooting |
| references/session-management.md | Parallel sessions, state persistence, concurrent scraping |
| references/authentication.md | Login flows, OAuth, 2FA handling, state reuse |
| references/video-recording.md | Recording workflows for debugging and documentation |
| references/proxy-support.md | Proxy configuration, geo-testing, rotating proxies |
| Template | Description |
|---|---|
| templates/form-automation.sh | Form filling with validation |
| templates/authenticated-session.sh | Login once, reuse state |
| templates/capture-workflow.sh | Content extraction with screenshots |
```bash ./templates/form-automation.sh https://example.com/form ./templates/authenticated-session.sh https://app.example.com/login ./templates/capture-workflow.sh https://example.com ./output ```