Build an Obsidian-compatible knowledge base from public web sources using the TinyFish CLI. Use this skill when a user wants a builder-grade markdown knowledge base on a technical topic, asks for a structured research vault, or wants a topic compiled from live public sources into interlinked markdown files. Supports two input modes: topic only, or topic plus starter URLs. Supports both first-build and update workflows. Always generates index.md, sources.md, audit.md, and manifest.json. Creates additional files only when the evidence supports them. The output must synthesize the topic into a usable mental model, not just summarize pages. Uses explicit tinyfish agent run commands and public web sources only. Optional `--trace` mode saves raw TinyFish outputs under `_trace/` for debugging.
Build a topic-specific markdown knowledge base by using TinyFish to browse public web sources and extract structured evidence.
This skill is for builder knowledge bases, not personal journals and not direct code generation.
The output is a folder you can drop into Obsidian immediately, and update later without starting over.
Do not produce a pile of source summaries.
The KB should help the reader understand:
If the output only says what each source said, the skill has failed.
Run both checks before any TinyFish call:
which tinyfish && tinyfish --version || echo "TINYFISH_CLI_NOT_INSTALLED"
tinyfish auth status
If TinyFish is not installed, stop and tell the user:
npm install -g @tiny-fish/cli
If TinyFish is not authenticated, stop and tell the user:
tinyfish auth login
Do not continue until both checks pass.
tinyfish agent run commandsindex.md, sources.md, audit.md, and manifest.json; everything else is dynamicYou support two modes:
Build me a knowledge base on web agent frameworksBuild me a knowledge base on web agent frameworks and start from these URLs: ...Update my knowledge base on Kolmogorov-Arnold Networks with these new URLs: ...Build me a knowledge base on browser agents --traceIf the topic is missing, ask for it before proceeding.
If starter URLs are present:
If the user explicitly says update, refresh, add these sources, or clearly wants to add to an existing KB, switch into update mode.
If the user includes --trace, trace, debug, or explicitly asks for raw outputs:
_trace/_trace/ out of the main page navigation unless the user asks for itCreate a folder named:
kb-{topic-slug}/
Examples:
kb-web-agent-frameworks/kb-kolmogorov-arnold-networks/kb-landing-page-design-patterns/When trace mode is enabled, also create:
kb-{topic-slug}/_trace/
index.mdThis file is always required. It should contain:
[[wikilinks]]sources.mdThis file is always required. It should log every URL visited with:
Use ISO 8601 timestamps.
Each source entry must use a stable source ID such as S001, S002, S003.
Example:
## [S001] 2026-04-06T08:49:24.014Z | useful
- URL: https://example.com
- Label: Official docs
- Reason opened: discovery pass for {TOPIC}
- Notes: yielded 4 good follow-up links
audit.mdThis file is always required. It is the trust layer for the KB.
It must contain four sections:
FOUNDINFERREDCONFLICTINGMISSINGExample:
# Audit
## FOUND
- [FOUND | S003] Pikachu is an Electric-type Mouse Pokemon.
## INFERRED
- [INFERRED | S003,S004] Pikachu's mascot role is reinforced across both official canon and encyclopedia framing.
## CONFLICTING
- [CONFLICTING | S004,S009] Source A says X while source B frames Y.
## MISSING
- [MISSING] No dedicated benchmark source was read in this run.
Rules:
FOUND requires at least one direct source IDINFERRED should usually reference at least two source IDsCONFLICTING must name the disagreement explicitlyMISSING should be used whenever the KB lacks evidence rather than hand-wavingmanifest.jsonThis file is always required. It stores:
Do not hardcode a fixed set like papers.md or repos.md for every topic.
Create additional files only when the topic actually supports them.
Common examples:
papers.mdrepos.mddocs.mdarticles.mddatasets.mdbenchmarks.mdpeople.mdglossary.mdtimeline.mdlandscape.mdreading-order.mddisagreements.mdwhat-matters.mdRules:
index.md insteadupdates.md when the KB is refreshed in update modelandscape.mddisagreements.mdreading-order.mdAll generated markdown files should use [[wikilinks]] when linking to other local pages.
Trace mode exception:
_trace/ are debugging artifacts, not user-facing KB pagesindex.md with _trace/ links unless the user explicitly asksUse a two-pass workflow:
Use one TinyFish run per URL.
Do not ask one TinyFish agent to cover multiple independent sites in a single command.
Run independent URLs in parallel where possible using background jobs and wait.
Determine whether this run is:
build — creating a KB from scratchupdate — adding or refreshing sources in an existing KBAlso determine:
TRACE = true or falseUse update mode when:
In update mode:
index.mdsources.mdaudit.mdmanifest.jsonWrite down:
TOPICTOPIC_SLUGSTARTER_URLS if providedMODE = build or updateTRACE = true or falseKeep the topic human-readable in the markdown output.
If the user gave starter URLs:
If a starter URL is a direct arXiv paper page such as /abs/..., /pdf/..., or an arXiv HTML render:
Then expand with a small set of public discovery URLs relevant to the topic. Choose from these patterns when relevant:
https://github.com/search?q={TOPIC}&type=repositories
https://arxiv.org/search/?query={TOPIC}&searchtype=all
https://huggingface.co/models?search={TOPIC}
https://huggingface.co/datasets?search={TOPIC}
https://duckduckgo.com/?q={TOPIC}
Only include discovery URLs that are likely to produce useful public results.
Aim for 4-8 discovery URLs in the first pass, not 20.
Always reserve one extra discovery slot for a trusted-source scout that is not limited to the template list above.
Trusted-source scout rule:
Important:
When selecting discovery and reading targets, prefer sources that improve understanding, not just coverage:
Do not spend most of your budget on redundant summaries of the same idea.
For each discovery URL, run TinyFish with a concrete extraction goal.
Command template:
tinyfish agent run --sync --url "{DISCOVERY_URL}" \
"You are helping build a markdown knowledge base on '{TOPIC}'.
Read this page and identify up to 5 high-value public URLs worth following.
Prefer official docs, canonical GitHub repos, papers, datasets, benchmarks, and
high-signal tutorials or explainers.
Return JSON:
{
\"candidates\": [
{
\"title\": \"\",
\"url\": \"\",
\"sourceType\": \"docs|repo|paper|dataset|article|benchmark|person|other\",
\"whyItMatters\": \"\"
}
]
}
Rules:
- public URLs only
- max 5 candidates
- do not guess URLs
- if nothing useful is found, return an empty array" \
> /tmp/kb_discovery_{SAFE_NAME}.json &
Trusted-source scout template:
tinyfish agent run --sync --url "{GENERAL_SEARCH_URL}" \
"You are helping build a markdown knowledge base on '{TOPIC}'.
Search this results page for up to 5 trusted high-value sources that are
NOT already represented by the template discovery URLs.
Prefer official docs, official company or lab pages, standards bodies,
official benchmark sites, top conference project pages, and strong primary
source explainers from recognized builders or research groups.
Return JSON:
{
\"candidates\": [
{
\"title\": \"\",
\"url\": \"\",
\"sourceType\": \"docs|repo|paper|dataset|article|benchmark|person|other\",
\"whyItMatters\": \"\",
\"whyTrusted\": \"\"
}
]
}
Rules:
- public URLs only
- max 5 candidates
- do not guess URLs
- avoid low-signal SEO listicles unless they point to a stronger primary source
- if the template list already covers the best sources, return an empty array" \
> /tmp/kb_discovery_trusted_{SAFE_NAME}.json &
Important runtime behavior:
> /tmp/...json, that file may stay 0 bytes until the run exitsTimeout rule:
partial or blockedAfter launching all discovery runs:
wait
Interpretation rule:
wait finishing slowly because one arXiv process is still running is expected behaviorIf TRACE=true, copy or save the raw discovery outputs into _trace/ with readable names such as:
_trace/discovery-github.json_trace/discovery-arxiv.json_trace/discovery-ddg.json_trace/discovery-trusted-scout.jsonThen read all discovery outputs, merge them, deduplicate by URL, and choose the best 6-12 URLs for the reading pass.
Trusted-source promotion rule:
sources.md only if you actually opened themSelection priority:
By default, do not spend your budget on social posts, Reddit threads, or generic chatter unless the user explicitly asks for them.
Run one TinyFish agent per chosen URL.
Command template:
tinyfish agent run --sync --url "{TARGET_URL}" \
"You are extracting evidence for a markdown knowledge base on '{TOPIC}'.
Read this source carefully and return structured JSON.
Extract:
- title
- canonicalUrl
- sourceType
- shortSummary
- keyFindings: up to 7 bullets
- whyItMatters
- foundationality: foundational|important|derivative|unclear
- approachOrSchool: the main approach, camp, or framing this source represents
- whatThisChanges: one line on how this source changes the reader's understanding
- importantEntities: people, projects, libraries, datasets, papers, companies
- importantLinks: up to 5 URLs mentioned or linked from the page
- suggestedPages: page names this should contribute to, e.g. [\"repos\", \"papers\", \"docs\", \"articles\", \"benchmarks\"]
- evidenceQuality: high|medium|low
- limitations: things this page did not answer
If this is a GitHub repository:
- inspect the README
- inspect up to 3 important files or folders if they are clearly relevant
- include key files or folders under keyFindings
If this is a paper:
- extract the title, abstract-level contribution, and 3-5 implementation-relevant points
If this is documentation:
- extract concepts, APIs, workflows, and caveats
If this is a dataset or model page:
- extract task, modality, schema if visible, and usage constraints
Also extract:
- what this source says that is actually important
- what this source does NOT resolve
Return JSON only.
Do not invent facts. If something is missing, say it is missing." \
> /tmp/kb_read_{SAFE_NAME}.json &
After launching all reading runs:
wait
If TRACE=true, save the raw reading outputs into _trace/ as well, for example:
_trace/read-paper.json_trace/read-repo-main.json_trace/read-docs.jsonDo not summarize _trace/ into the main KB pages. It exists for inspection, debugging, and trust when needed.
Before writing the synthesis pages, update sources.md with every visited URL.
In build mode:
S001In update mode:
Use one section per visited page. Do not skip failed or low-value pages.
Before or while writing the content pages, classify important claims into:
FOUNDINFERREDCONFLICTINGMISSINGUse these rules:
FOUND = directly supported by one or more sourcesINFERRED = synthesis across sources or a careful deductionCONFLICTING = sources disagree or frame something differentlyMISSING = the KB does not have enough evidenceThe audit file is required even if it is short.
For especially important claims in topic pages, you may add inline markers like:
- [FOUND | S003] ...
- [INFERRED | S003,S004] ...
Use them sparingly. Do not turn every line into metadata noise.
Before deciding the final page set, synthesize the field as a field.
You must identify:
If the topic is broad and source-rich, this synthesis should appear in:
index.mdlandscape.mdreading-order.mddisagreements.mdwhat-matters.mdAnti-summary rule:
Create the optional pages based on the actual evidence you found.
Good examples:
papers.mdrepos.mddocs.mdbenchmarks.mdarticles.mdIf the topic does not have a category, skip that file.
Do not create a research-shaped output for topics that are not research-shaped.
Write clean markdown. Keep it skimmable and builder-friendly.
index.md structureUse this pattern:
# {TOPIC}
## Overview
{2-4 paragraph overview}
## Mental Model
{Explain the topic so a smart builder can actually understand the structure of the space.}
## Pages
- [[docs]]
- [[repos]]
- [[articles]]
## Key Takeaways
- ...
- ...
## Gaps
- ...
## Reading Order
- {what to read first}
- {what to read second}
- {what to skip until later}
## Source Log
- [[sources]]
In update mode, add a short section such as:
## This Run
- Mode: update
- Updated pages: [[papers]], [[docs]]
- See also: [[updates]]
Each optional page should:
# {Page Name}[[wikilinks]] to sibling pages where relevantExample:
# Repositories
## Canonical Repos
### Primary repository
- Summary: ...
- Why it matters: ...
- Key files or concepts: ...
- Related: [[docs]], [[articles]]
- Source: [GitHub](https://github.com/...)
updates.md structureCreate this file only in update mode, or append to it if it already exists.
Pattern:
# Updates
## Run 2 | 2026-04-08T10:11:00Z
- Added sources: [S007], [S008]
- Updated pages: [[papers]], [[docs]]
- New confirmed claims:
- [FOUND | S008] ...
- Open conflicts:
- [CONFLICTING | S004,S008] ...
manifest.json structureAt minimum, store:
topictopic_slugmodetracecreated_atlast_updated_atpagesrunsAppend a new run entry on each build or update.
Always follow these rules:
index.md, sources.md, audit.md, and manifest.json are mandatory[[wikilinks]] for local page referencessources.mdGood:
Bad:
That is fine. Do not create papers.md.
That is fine. Do not create datasets.md.
Create files like docs.md and articles.md.
Prefer fewer high-quality sources over many weak ones.
Use the starter URLs first, extract them well, and keep the KB narrow.
At the end, report:
Use a concise summary like:
KB Builder complete for {TOPIC}
Output: kb-{topic-slug}/
Mode: {MODE}
Trace: {TRACE}
Files: index.md, sources.md, ...
URLs visited: 11
Open gaps: benchmarks unclear, no public dataset page found