Use this skill when a task needs browser automation through PinchTab: open a website, inspect interactive elements, click through flows, fill out forms, scrape page text, log into sites with a persistent profile, export screenshots or PDFs, manage multiple browser instances, or fall back to the HTTP API when the CLI is unavailable. Prefer this skill for token-efficient browser work driven by stable accessibility refs such as `e5` and `e12`.
PinchTab gives agents a browser they can drive through stable accessibility refs, low-token text extraction, and persistent profiles or instances. Treat it as a CLI-first browser skill; use the HTTP API only when the CLI is unavailable or you need profile-management routes that do not exist in the CLI yet.
Preferred tool surface:
pinchtab CLI commands first.curl for profile-management routes or non-shell/API fallback flows.jq only when you need structured parsing from JSON responses.When multiple agents share one PinchTab server, always give each agent a stable ID.
pinchtab --agent-id <agent-id> ...PINCHTAB_AGENT_ID=<agent-id>X-Agent-Id: <agent-id> on requests that should be attributed to that agentThat identity is recorded as in activity events and powers:
agentIdIf you are switching between unrelated browser tasks, do not reuse the same agent ID unless you intentionally want one combined activity trail.
http://localhost targets. Only use a remote PinchTab server when the user explicitly provides it and, if needed, a token.text, snap -i -c, snap -d, find, click, fill, type, press, select, hover, scroll.Every PinchTab automation follows this pattern:
pinchtab nav <url> or pinchtab instance navigate <instance-id> <url>.pinchtab snap -i -c, pinchtab snap --text, or pinchtab text, then collect the current refs such as e5.click, fill, type, press, select, hover, or scroll.Rules:
pinchtab text when you need content, not layout.pinchtab snap -i -c when you need actionable elements.PinchTab uses a unified selector system. Any command that targets an element accepts these formats:
| Selector | Example | Resolves via |
|---|---|---|
| Ref | e5 | Snapshot cache (fastest) |
| CSS | #login, .btn, [data-testid="x"] | document.querySelector |
| XPath | xpath://button[@id="submit"] | CDP search |
| Text | text:Sign In | Visible text match |
| Semantic | find:login button | Natural language query via /find |
Auto-detection: bare e5 → ref, #id / .class / [attr] → CSS, //path → XPath. Use explicit prefixes (css:, xpath:, text:, find:) when auto-detection is ambiguous.
pinchtab click e5 # ref
pinchtab click "#submit" # CSS (auto-detected)
pinchtab click "text:Sign In" # text match
pinchtab click "xpath://button[@type]" # XPath
pinchtab fill "#email" "[email protected]" # CSS
pinchtab fill e3 "[email protected]" # ref
The same syntax works in the HTTP API via the selector field:
{"kind": "click", "selector": "text:Sign In"}
{"kind": "fill", "selector": "#email", "text": "[email protected]"}
{"kind": "click", "selector": "e5"}
Legacy ref field is still accepted for backward compatibility.
Use && only when you do not need to inspect intermediate output before deciding the next step.
Good:
pinchtab nav https://pinchtab.com && pinchtab snap -i -c
pinchtab click --wait-nav e5 && pinchtab snap -i -c
pinchtab nav https://pinchtab.com --block-images && pinchtab text
Run commands separately when you must read the snapshot output first:
pinchtab nav https://pinchtab.com
pinchtab snap -i -c
# Read refs, choose the correct e#
pinchtab click e7
pinchtab snap -i -c
If a page shows a challenge instead of content (e.g., "Just a moment..."), call POST /solve with {"maxAttempts": 3} to auto-detect and resolve it. Use POST /tabs/TAB_ID/solve for tab-scoped. Works best with stealthLevel: "full" in config. Safe to call speculatively — returns immediately if no challenge is present. See api.md for full solver options.
Pick a pattern before interacting with the site:
pinchtab instance start → use --server http://localhost:<port> for commands.pinchtab instance start --profile work --mode headed → switch to --mode headless after login is stored.POST /profiles with {"name":"..."}, then POST /profiles/<name>/start.POST /instances/start, then target the instance port with curl. Send X-Agent-Id for attribution.If the server is exposed beyond localhost, require a token. See TRUST.md.
Agent sessions: Each agent can get its own revocable session token via pinchtab session create --agent-id <id> or POST /sessions. Set PINCHTAB_SESSION=ses_... or send Authorization: Session ses_.... Sessions have idle timeout (default 30m) and max lifetime (default 24h).
pinchtab server # Start server foreground
pinchtab daemon install # Install as system service
pinchtab health # Check server status
pinchtab instances # List running instances
pinchtab profiles # List available profiles
pinchtab --server http://localhost:9868 snap -i -c # Target specific instance
pinchtab nav <url>
pinchtab nav <url> --new-tab
pinchtab nav <url> --tab <tab-id>
pinchtab nav <url> --block-images
pinchtab nav <url> --block-ads
pinchtab back # Navigate back in history
pinchtab forward # Navigate forward
pinchtab reload # Reload current page
pinchtab tab # List tabs or focus by ID
pinchtab tab new <url>
pinchtab tab close <tab-id>
pinchtab instance navigate <instance-id> <url>
pinchtab snap
pinchtab snap -i # Interactive elements only
pinchtab snap -i -c # Interactive + compact
pinchtab snap -d # Diff from previous snapshot
pinchtab snap --selector <css> # Scope to CSS selector
pinchtab snap --max-tokens <n> # Token budget limit
pinchtab snap --text # Text output format
pinchtab text # Page text content
pinchtab text --raw # Raw text extraction
pinchtab find <query> # Semantic element search
pinchtab find --ref-only <query> # Return refs only
Guidance:
snap -i -c is the default for finding actionable refs.snap -d is the default follow-up snapshot for multi-step flows.text is the default for reading articles, dashboards, reports, or confirmation messages.find --ref-only is useful when the page is large and you already know the semantic target.snap -i and full snap use different numbering. Do not mix them — if you snapshot with -i, use those refs. If you re-snapshot without -i, get fresh refs before acting.All interaction commands accept unified selectors (refs, CSS, XPath, text, semantic). See the Selectors section above.
pinchtab click <selector> # Click element
pinchtab click --wait-nav <selector> # Click and wait for navigation
pinchtab click --x 100 --y 200 # Click by coordinates
pinchtab dblclick <selector> # Double-click element
pinchtab type <selector> <text> # Type with keystrokes
pinchtab fill <selector> <text> # Set value directly
pinchtab press <key> # Press key (Enter, Tab, Escape...)
pinchtab hover <selector> # Hover element
pinchtab select <selector> <value> # Select dropdown option
pinchtab scroll <selector|pixels> # Scroll element or page
Rules:
fill for deterministic form entry.type only when the site depends on keystroke events.click --wait-nav when a click is expected to navigate.click, press Enter, select, or scroll if the UI can change.filter=interactive first — the output shows <option> elements with their value attributes. Then use select with the exact value.pinchtab screenshot
pinchtab screenshot -o /tmp/pinchtab-page.png # Format driven by extension
pinchtab screenshot -q 60 # JPEG quality
pinchtab pdf
pinchtab pdf -o /tmp/pinchtab-report.pdf
pinchtab pdf --landscape
Use these only when the task explicitly requires them and safer commands are insufficient.
pinchtab eval "document.title"
pinchtab download <url> -o /tmp/pinchtab-download.bin
pinchtab upload /absolute/path/provided-by-user.ext -s <css>
Rules:
eval is for narrow, read-only DOM inspection unless the user explicitly asks for a page mutation.download should prefer a safe temporary or workspace path over an arbitrary filesystem location.upload requires a file path the user explicitly provided or clearly approved for the task.curl -X POST http://localhost:9868/navigate \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com"}'
curl "http://localhost:9868/snapshot?filter=interactive&format=compact"
curl -X POST http://localhost:9868/action \
-H "Content-Type: application/json" \
-d '{"kind":"fill","selector":"e3","text":"[email protected]"}'
curl http://localhost:9868/text
## Instance-scoped solve (instance port, not server port)
curl -X POST http://localhost:9868/solve \
-H "Content-Type: application/json" \
-d '{"maxAttempts": 3}'
curl http://localhost:9868/solvers
Use the API when:
Important: Each POST /navigate creates a new tab by default. The default (non-tab-scoped) endpoints like /snapshot, /action, /text operate on the active tab, which may not be the one you just navigated. In multi-tab workflows, always use tab-scoped routes to avoid acting on the wrong page.
Get the tab ID from the navigate response or from GET /tabs.
# List all tabs
curl http://localhost:9867/tabs \
-H "Authorization: Bearer <token>"
# Navigate in a specific tab (does not create a new tab)
curl -X POST http://localhost:9867/tabs/TAB_ID/navigate \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"url":"https://example.com"}'
# Snapshot a specific tab
curl "http://localhost:9867/tabs/TAB_ID/snapshot?filter=interactive&format=compact" \
-H "Authorization: Bearer <token>"
# Get text from a specific tab
curl http://localhost:9867/tabs/TAB_ID/text \
-H "Authorization: Bearer <token>"
# Perform action on a specific tab
curl -X POST http://localhost:9867/tabs/TAB_ID/action \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"kind":"click","selector":"#submit-btn"}'
# Navigate back/forward in a specific tab
curl -X POST http://localhost:9867/tabs/TAB_ID/back \
-H "Authorization: Bearer <token>"
curl -X POST http://localhost:9867/tabs/TAB_ID/forward \
-H "Authorization: Bearer <token>"
# Screenshot (GET, not POST)
curl http://localhost:9867/tabs/TAB_ID/screenshot \
-H "Authorization: Bearer <token>" \
--output screenshot.png
# PDF export (GET or POST)
curl http://localhost:9867/tabs/TAB_ID/pdf \
-H "Authorization: Bearer <token>" \
--output page.pdf
# Close a tab
curl -X POST http://localhost:9867/tabs/TAB_ID/close \
-H "Authorization: Bearer <token>"
The default (non-tab-scoped) endpoints also support screenshots and PDF:
# Screenshot of active tab (GET)
curl http://localhost:9867/screenshot \
-H "Authorization: Bearer <token>" \
--output screenshot.png
# PDF of active tab (GET or POST)
curl http://localhost:9867/pdf \
-H "Authorization: Bearer <token>" \
--output page.pdf
Navigation with waitNav: When clicking a link or button that triggers page navigation, include "waitNav": true in the action body. Without it, PinchTab returns a navigation_changed error to protect against unexpected navigation during form interactions.
{"kind": "click", "selector": "#search-btn", "waitNav": true}
All tab-scoped routes follow the pattern /tabs/{TAB_ID}/... and mirror the default endpoints. The full list includes: navigate, back, forward, reload, snapshot, screenshot, text, pdf, action, actions, dialog, wait, find, lock, unlock, cookies, metrics, network, solve, close, storage, evaluate, download, upload.
pinchtab nav https://pinchtab.com && pinchtab snap -i -c
pinchtab nav https://example.com/login
pinchtab snap -i -c
pinchtab fill e3 "[email protected]"
pinchtab fill e4 "correct horse battery staple"
pinchtab click --wait-nav e5
pinchtab text
pinchtab nav https://example.com/search
pinchtab snap -i -c
pinchtab fill e2 "quarterly report"
pinchtab click e3 # Click the Search button
pinchtab text
Form submission: Always click the submit button — never use press Enter. Most HTML forms only fire their submission handler on button click, not on Enter keypress.
pinchtab nav https://example.com/checkout
pinchtab snap -i -c
pinchtab click e8
pinchtab snap -d -i -c
When you know the page structure, skip the snapshot and use CSS or text selectors directly:
pinchtab click "text:Accept Cookies"
pinchtab fill "#search" "quarterly report"
pinchtab click "xpath://button[@type='submit']"
text, snap -i -c, and snap -d before screenshots, PDFs, eval, downloads, or uploads.--block-images for read-heavy tasks that do not need visual assets.pinchtab snap -d after each state-changing action in long workflows.pinchtab text to confirm success messages, table updates, or navigation outcomes.pinchtab screenshot only when visual regressions, CAPTCHA, or layout-specific confirmation matters.