Browser automation via MCP tools. ALWAYS use these tools for ANY web task - navigating sites, clicking, typing, filling forms, taking screenshots, or extracting data. This is the ONLY way to control the browser.
Browser automation using MCP tools. Use these tools directly for all web automation tasks.
NEVER use bash/shell commands to open browsers or URLs. This includes:
open (macOS)xdg-open (Linux)start (Windows)subprocess, webbrowser, or similarShell commands open the user's default browser (Safari, Arc, Firefox, etc.), not the automation-controlled Chrome instance. This breaks the workflow because you cannot interact with pages opened via shell commands.
ALL browser automation MUST use the browser_* MCP tools below.
For multi-step workflows, ALWAYS use browser_script. It's 5-10x faster than individual calls.
browser_script finds elements at RUNTIME using CSS selectors - you don't need refs beforehand. This enables complete workflows in ONE call.
browser_script(actions=[
{"action": "goto", "url": "example.com/login"},
{"action": "waitForLoad"},
{"action": "findAndFill", "selector": "input[type='email']", "text": "[email protected]"},
{"action": "findAndFill", "selector": "input[type='password']", "text": "secret123"},
{"action": "findAndClick", "selector": "button[type='submit']"},
{"action": "waitForNavigation"}
])
This executes the entire login in ONE roundtrip. A snapshot is automatically returned so you always see the final page state.
| Action | Parameters | Description |
|---|---|---|
goto | url | Navigate to URL |
waitForLoad | timeout? | Wait for page to load |
waitForSelector | selector, timeout? | Wait for element to appear |
waitForNavigation | timeout? | Wait for navigation to complete |
findAndFill | selector, text, pressEnter?, skipIfNotFound? | Find element, fill it |
findAndClick | selector, skipIfNotFound? | Find element, click it |
fillByRef | ref, text, pressEnter? | Fill using ref from snapshot |
clickByRef | ref | Click using ref from snapshot |
snapshot | - | Get ARIA snapshot (returned at end) |
screenshot | fullPage? | Take screenshot (returned at end) |
keyboard | key OR text | Press key or type text |
evaluate | code | Run JavaScript in page |
findAndFill and findAndClick locate elements when the action runsgoto and clicks wait for page stabilitybrowser_script(actions=[
{"action": "goto", "url": "example.com/signup"},
{"action": "waitForLoad"},
{"action": "findAndFill", "selector": "#name", "text": "John Doe"},
{"action": "findAndFill", "selector": "#email", "text": "[email protected]"},
{"action": "findAndFill", "selector": "#phone", "text": "555-1234", "skipIfNotFound": true},
{"action": "findAndClick", "selector": "button[type='submit']"},
{"action": "waitForNavigation"}
])
Note: No need to add snapshot at the end - it's automatic!
browser_navigate(url, page_name?) - Navigate to a URL
browser_snapshot(page_name?, interactive_only?) - Get the page's accessibility tree
browser_click(x?, y?, ref?, selector?, page_name?) - Click on the page
browser_type(ref?, selector?, text, press_enter?, page_name?) - Type into an input
browser_screenshot(page_name?, full_page?) - Take a screenshot
browser_evaluate(script, page_name?) - Run custom JavaScript
browser_pages(action, page_name?) - Manage pages
browser_keyboard(text?, key?, page_name?) - Type to the focused element
browser_script(actions, page_name?) - Execute complete workflows in ONE call (see above)
browser_batch_actions(urls, extractScript, waitForSelector?, page_name?) - Extract data from multiple URLs in ONE call
browser_sequence(actions, page_name?) - Simpler batching (requires refs beforehand)
browser_get_text(ref?, selector?, page_name?) - Get text content of element
browser_is_visible(ref?, selector?, page_name?) - Check if element is visible
browser_is_enabled(ref?, selector?, page_name?) - Check if element is enabled
browser_is_checked(ref?, selector?, page_name?) - Check if checkbox/radio is checked
Preferred: Use browser_script for complete workflows
browser_script(actions=[
{"action": "goto", "url": "example.com"},
{"action": "waitForLoad"},
{"action": "findAndFill", "selector": "input[name='search']", "text": "query", "pressEnter": true},
{"action": "waitForNavigation"}
])
→ Returns step results + final page snapshot automatically
Alternative: Step-by-step when you need to inspect first
browser_navigate("example.com") - Go to pagebrowser_snapshot() - Find refs like [ref=e5]browser_script or browser_sequence - Execute remaining actionsAfter EVERY action, verify it succeeded before proceeding:
Example verification flow:
# Click a submit button
browser_click(ref="e5")
# VERIFY: Check if success message appeared
browser_is_visible(selector=".success-message")
# Output: true
# If false, the action may have failed - investigate before proceeding
browser_snapshot() # See what actually happened
Why this matters:
When verification fails:
When actions fail, the error message will tell you what to do:
| Error | What it means | What to do |
|---|---|---|
| "Element blocked by overlay" | Modal/popup covering element | Find close button, press Escape, or click outside |
| "Element not found" | Page changed, ref is stale | Run browser_snapshot() to get updated refs |
| "Multiple elements match" | Selector too broad | Use more specific ref from snapshot |
| "Element not visible" | Element exists but hidden | Scroll into view or wait for it to appear |
| "Page closed" | Tab was closed | Use browser_tabs(action="list") to find correct tab |
Never give up on first failure. Take a snapshot, understand what happened, then adapt.
ALWAYS check for new tabs after clicking links or buttons.
Many websites open content in new tabs. If you click something and the page seems unchanged, a new tab likely opened.
Workflow after clicking:
browser_click(ref="e5") - Click the elementbrowser_tabs(action="list") - Check if new tabs openedbrowser_tabs(action="switch", index=N) - Switch to itbrowser_snapshot() - Get content from correct tabExample:
# Click a link that might open new tab
browser_click(ref="e3")
# Check tabs - ALWAYS do this after clicking!
browser_tabs(action="list")
# Output: Open tabs (2):
# 0: https://original.com
# 1: https://newpage.com
#
# Multiple tabs detected! Use browser_tabs(action="switch", index=N) to switch to another tab.
# New tab opened! Switch to it
browser_tabs(action="switch", index=1)
# Output: Switched to tab 1: https://newpage.com
#
# Now use browser_snapshot() to see the content of this tab.
# Now snapshot the new tab
browser_snapshot()
Signs you might be on the wrong tab:
When to check tabs:
browser_script(actions=[
{"action": "goto", "url": "google.com"},
{"action": "waitForLoad"},
{"action": "findAndFill", "selector": "textarea[name='q']", "text": "cute animals", "pressEnter": true},
{"action": "waitForNavigation"}
])
Returns: step results + final page snapshot (automatic)
browser_script(actions=[
{"action": "goto", "url": "example.com/login"},
{"action": "waitForLoad"},
{"action": "findAndFill", "selector": "input[type='email']", "text": "[email protected]"},
{"action": "findAndFill", "selector": "input[type='password']", "text": "mypassword"},
{"action": "findAndClick", "selector": "button[type='submit']"},
{"action": "waitForNavigation"}
])
Returns: step results + final page snapshot (automatic)
browser_script(actions=[
{"action": "goto", "url": "example.com/checkout"},
{"action": "waitForLoad"},
{"action": "findAndFill", "selector": "#name", "text": "John Doe"},
{"action": "findAndFill", "selector": "#address", "text": "123 Main St"},
{"action": "findAndFill", "selector": "#city", "text": "New York"},
{"action": "findAndFill", "selector": "#zip", "text": "10001"},
{"action": "findAndClick", "selector": "button.submit"},
{"action": "waitForNavigation"}
])
Returns: step results + final page snapshot (automatic)
When you need data from multiple pages (e.g. search results, listings, product comparisons), first collect the URLs, then extract data in bulk:
Step 1: Use browser_evaluate or browser_script to collect URLs from a search results page:
browser_evaluate(script="return [...document.querySelectorAll('a.listing-link')].map(a => a.href)")
Step 2: Extract data from all URLs in one call:
browser_batch_actions({
urls: ["https://example.com/listing/1", "https://example.com/listing/2", "..."],
extractScript: "return { title: document.querySelector('h1')?.textContent, price: document.querySelector('.price')?.textContent, details: document.querySelector('.details')?.textContent?.slice(0, 300) }",
waitForSelector: "h1"
})
Returns: compact JSON with results for each URL — no snapshots, no screenshots, minimal tokens.
When to use browser_batch_actions vs browser_script:
browser_batch_actions: Visiting multiple URLs to extract data from each. No interaction needed per page.browser_script: Performing a workflow on a single page (fill forms, click buttons, navigate).IMPORTANT: For Google Docs/Sheets/Slides, navigate directly and use browser_keyboard:
browser_navigate(url="docs.google.com/document/create")
browser_click(x=640, y=300) # Focus editor
browser_keyboard(text="Hello, this is my document")
browser_keyboard(key="Enter")
browser_keyboard(text="Second paragraph")
browser_screenshot() # Verify
Direct URLs:
docs.google.com/document/createdocs.google.com/spreadsheets/createdocs.google.com/presentation/createWhen you encounter a login page (e.g., Google Sign-In, OAuth screens, authentication prompts):
This interactive login flow is essential because:
For saving/downloading content: