Fast headless CLI browser for all web tasks — reading pages, scraping, form filling, login flows, complex interactions. Default choice for any web browsing task. Fall back to Scout (browse tool) only if this fails.
Fast Rust CLI for browser automation via Chrome DevTools Protocol. Use this for all web browsing tasks — reading pages, scraping, form filling, extracting data, navigating complex flows. Use exec() to run commands.
Fall back to browse (Scout) only when:
agent-browser returns a non-zero exit code or clearly can't handle the taskAlways ask User before starting:
"Should I open the browser so you can see it (headed), or run it invisibly in the background (headless)?"
--headed) — browser window opens visibly. User can watch, intervene, or take over. Use this by default unless he says otherwise.exec("agent-browser open https://example.com --headed") # headed (visible)
exec("agent-browser open https://example.com") # headless (invisible)
--headed)@e1, @e2)exec("agent-browser open https://example.com --headed") # if headed
exec("agent-browser snapshot -i") # -i = interactive elements only (recommended)
exec("agent-browser click @e3")
exec("agent-browser fill @e5 'search term'")
exec("agent-browser press Enter")
exec("agent-browser wait --load networkidle")
exec("agent-browser snapshot -i") # re-snapshot after navigation
exec("agent-browser get text @e12")
agent-browser open <url> — navigate to URLagent-browser goto <url> — alias for openagent-browser close — shut down browseragent-browser wait --load networkidle — wait for page to settleagent-browser wait <ms> — wait duration (e.g. wait 1000)agent-browser wait --text "some text" — wait for text to appearagent-browser snapshot -i — accessibility tree of interactive elements with refs (recommended)agent-browser snapshot — full accessibility treeagent-browser snapshot --json — machine-readable JSON outputagent-browser screenshot — capture screenshotagent-browser screenshot --annotate — screenshot with numbered element labelsagent-browser click @e1 — click element by refagent-browser fill @e2 "text" — clear and fill inputagent-browser type @e2 "text" — type into element (appends)agent-browser press Enter — press keyboard key (Enter, Tab, Escape, etc.)agent-browser hover @e3 — hover elementagent-browser select @e4 "option" — select dropdown optionagent-browser check @e5 / agent-browser uncheck @e5 — toggle checkboxagent-browser drag @e1 @e2 — drag element to targetagent-browser scroll down 500 — scroll pageagent-browser eval 'document.title' — run JavaScriptagent-browser get text @e1 — get element textagent-browser get text body — get full page textagent-browser get html @e1 — get innerHTMLagent-browser get value @e1 — get input valueagent-browser get attr @e1 href — get attributeagent-browser get url — current URLagent-browser get title — page titleagent-browser find role button "Submit" — by ARIA roleagent-browser find text "Sign in" — by text contentagent-browser find label "Email" — by labelagent-browser find placeholder "Search..." — by placeholderagent-browser find testid "submit-btn" — by data-testidagent-browser set viewport 1920 1080 — set window sizeagent-browser set device "iPhone 14" — emulate deviceagent-browser cookies — manage cookiesagent-browser storage local — manage localStorageagent-browser screenshot --full — full-page screenshotagent-browser pdf /tmp/page.pdf — save as PDFagent-browser console — get console outputagent-browser errors — get page errorsagent-browser diff screenshot --baseline base.png — visual diffagent-browser uses its OWN bundled Chromium — not your system Chrome. Your existing browser logins do not carry over. Sessions only persist when a named --profile is used.
--profile main — one unified profile for everythingexec("agent-browser open https://site.com --headed --profile main")
Use one profile (main) for all sites — LinkedIn, GitHub, Gmail, everything. All logins accumulate in this single profile, exactly like a real browser. A task that touches multiple sites in one session works seamlessly because all sessions are available simultaneously.
Never create per-site profiles (--profile linkedin, --profile github, etc.) — those are isolated contexts that can't share sessions and will break multi-site tasks.
First-time login for a new site (one-time setup):
exec("agent-browser open https://site.com --headed --profile main")main permanently — all future tasks reuse itAll future sessions: --profile main has all accumulated logins. No re-login needed for any previously-authenticated site.
If you hit a login wall: close and retry immediately with --profile main. Do NOT attempt to automate the login form — ask User to log in once through the headed window.
Connect to Scout's Chrome session (only available while Scout is running, port 9222):
exec("agent-browser connect 9222")
Fall back to Scout if agent-browser can't handle authentication at all:
browse(task="<task with full URL>", launch_browser=True)
# Scout uses User's real Chrome profile with all logins active
If agent-browser fails (non-zero exit, CAPTCHA, broken page, login wall it can't handle):
# Hand off to Scout — it uses User's full Chrome profile with all logins
browse(task="<original task with full URL and instructions>", launch_browser=True)
# Read a page (headed — User can watch)
exec("agent-browser open https://news.ycombinator.com --headed")
exec("agent-browser get text body")
# Scrape with structure (headless — no need to watch)
exec("agent-browser open https://news.ycombinator.com")
exec("agent-browser snapshot --json") # parse JSON for structured data
# Search flow
exec("agent-browser open https://google.com --headed")
exec("agent-browser snapshot -i")
exec("agent-browser fill @e2 'Python async tutorial'")
exec("agent-browser press Enter")
exec("agent-browser wait --load networkidle")
exec("agent-browser snapshot -i")
exec("agent-browser get text body")
# Form fill
exec("agent-browser open https://example.com/contact")
exec("agent-browser snapshot -i")
exec("agent-browser fill @e3 'User'") # Name field ref from snapshot
exec("agent-browser fill @e4 '[email protected]'") # Email field ref
exec("agent-browser fill @e5 'Hello'") # Message ref
exec("agent-browser click @e8") # Submit button ref
exec("agent-browser wait --load networkidle")
exec("agent-browser get text body") # Confirm submission
# Screenshot for evidence
# stdout: "✓ Screenshot saved to C:\Users\..\.agent-browser\tmp\screenshots\screenshot-TIMESTAMP.png"
result = exec("agent-browser screenshot")
path = result.stdout.split("Screenshot saved to")[-1].strip()
send_user_media(path=path)
@e1, @e2) become invalid after page changes-i flag — filters to interactive elements only, cleaner and faster for agentssnapshot --json when you need structured/parseable output