Trove collection and normalization for swain-design artifacts. Collects sources from the web, local files, and media (video/audio), normalizes them to markdown, and caches them in reusable troves. Use when researching a topic for a spike, ADR, vision, or any artifact that needs structured research. Also use to refresh stale troves or extend existing ones with new sources. Triggers on: 'research X', 'gather sources for', 'compile research on', 'search for sources about', 'refresh the trove', 'find existing research on X', or when swain-design needs research inputs for a spike or ADR.
Collect, normalize, and cache source materials into reusable troves that swain-design artifacts can reference.
| Signal | Mode |
|---|---|
| No trove exists for the topic, or user says "research X" / "gather sources" | Create — new trove |
| Trove exists and user provides new sources or says "add to" / "extend" | Extend — add sources to existing trove |
| Trove exists and user says "refresh" or sources are past TTL | Refresh — re-fetch stale sources |
| User asks "what troves do we have" or "find sources about X" | Discover — search existing troves by tag |
Before creating a new trove or running web searches, scan existing troves for relevant content. This avoids duplicating research and surfaces connections to prior work.
# Search trove manifests by tag
grep -rl "<keyword>" docs/troves/*/manifest.yaml 2>/dev/null
# Search trove source content
grep -rl "<keyword>" docs/troves/*/sources/**/*.md 2>/dev/null
# Search trove syntheses
grep -rl "<keyword>" docs/troves/*/synthesis.md 2>/dev/null
If existing troves contain relevant sources:
This step runs in all modes (Create, Extend, Discover) and before any web searches. Existing trove content is always checked first.
Build a new trove from scratch.
Ask the user (or infer from context) for:
websocket-vs-sse). Suggest one if the context is clear.real-time, websocket, sse)If invoked from swain-design (e.g., spike entering Active), the artifact context provides the topic, tags, and sometimes initial sources.
For each source, use the appropriate capability. Read skills/swain-search/references/normalization-formats.md for the exact markdown structure per source type.
Web search queries:
Web page URLs:
failed: true flag and move onPaywall proxy fallback:
After fetching a web page, check if a paywall proxy is available for the URL's domain:
skills/swain-search/scripts/resolve-proxy.sh <url>
PROXY:<name>:<proxy-url> and SIGNAL:<text> linesSIGNAL text (case-sensitive literal match)<url> — trying proxy fallback"PROXY URL in order, fetching via the same page-fetching capability used for web pagesproxy-used: <name> and notes: "Full article retrieved via <name> proxy" in the manifest entrynotes: "Paywalled; proxies exhausted — content from direct fetch only"The registry lives at skills/swain-search/references/paywall-proxies.yaml. Add new domains or proxies there — no skill file changes needed.
Video/audio URLs:
Local files:
Forum threads / discussions:
Repositories:
sources/<source-id>/selective: true in the manifest entryhighlights array with paths to the most important files (relative to the source-id directory)Documentation sites:
sources/<source-id>/selective: truehighlights array with paths to the most important pagesEach normalized source gets a slug-based source ID and lives in a directory-per-source layout:
sources/<source-id>/<source-id>.mdsources/<source-id>/ with the original tree mirrored insideSource ID generation:
mdn-websocket-api, strangeloop-2025-realtime)__word1-word2 using two random words from skills/swain-search/references/wordlist.txt__ followed by 4 hex characters (e.g., __a3f8) as a fallbackCreate manifest.yaml following the schema in skills/swain-search/references/manifest-schema.md. Include:
Compute content hashes as bare hex SHA-256 digests (no prefix) of the normalized markdown content:
shasum -a 256 sources/mdn-websocket-api/mdn-websocket-api.md | cut -d' ' -f1
Create synthesis.md — a structured distillation of key findings across all sources.
Structure the synthesis by theme, not by source. Group related findings together, cite sources by ID, and surface:
Keep it concise. The synthesis is a starting point, not a comprehensive report — the user or artifact author will refine it.
Use the dual-commit pattern (same as swain-design lifecycle stamps) to give the trove a reachable commit hash.
Before Commit A — append a history entry to manifest.yaml with a -- placeholder for the commit hash: