Structured web scraping for AI coders: explore, then exploit with shipped templates, runner, and hooks.
WISE teaches an AI coding agent structured, repeatable web scraping for JS-rendered sites. The goal is a working scraping project built from shipped WISE assets.
Rule 0 — Orient before acting. Before opening a browser or writing any code, read
references/guide.md § Big Pictureto understand what you're building and what decisions you need to make. Only then start exploration.
Orient → Explore → Evidence → Choose tier → Exploit → TreeRecord → Assemble
agent-browser, test selectors, map navigationUse when: JS-rendered sites, pagination, UI state, filter combos, structured repeatable output.
Not when: a stable API/export exists, or static curl is clearly enough.
WISE profiles define a graph of NER nodes. Each node is a deterministic (state, action) → observation triple:
| Part | Schema field | What it answers |
|---|---|---|
| State | state | "Am I where I expect to be?" — precondition check |
| Action | action | "What deterministic thing do I do?" — browser primitives |
| Observation | extract | "What do I read/emit from this state?" — extraction rules |
| Successors | expand | "How many successor states?" — elements, pages, or combinations |
| Retry | retry | "Re-execute parent actions if state check fails" — { max, delay_ms } |
Nodes form a DAG via parents[]. The engine walks top-down: check state, execute actions, extract, expand, recurse into children. The root node (named in entry.root) must have parents: [] — it is the entry point and has no parent. All other nodes must reference at least one parent that exists in the same resource's nodes list.
text, attr, link, table, ai, html, image, grouped. See references/field-guide.md § Extraction for details.
click, select, scroll, wait, reveal, navigate, input. See references/field-guide.md § Actions for details.
Instead of separate type: pagination / type: matrix / multiple: true, all successor-state generation goes through expand:
expand.over | What it does | Old equivalent |
|---|---|---|
elements | One successor per CSS match | multiple: true |
pages | One successor per page (next/numeric/infinite) | type: pagination |
combinations | Cartesian product of filter axes | type: matrix |
Each expand block supports order: dfs | bfs (default: dfs).
Stop conditions on page expansion: sentinel (CSS appears), sentinel_gone (CSS disappears), stable (element count stops changing), limit (hard cap). See references/field-guide.md § Stop Conditions.
Emit + expand interaction: When a node has both emit and expand, node-level extraction is skipped; extraction happens per-element inside the expansion. See references/guide.md § Data Flow.
After exploration, the agent declares what data it expects to produce in the artifacts block. This serves as:
consumes / produces wire resources into a DAG