Ingest raw sources into an LLM Wiki vault. Reads unprocessed files from raw/, creates structured wiki pages (sources, entities, concepts, synthesis), links everything with wikilinks, updates the index and log. Use when the user says 'ingest', 'process raw', 'process clippings', 'update wiki', or when raw/ contains unread items that need processing.
Automate the ingest operation of the LLM Wiki pattern. Transform raw source material into structured, interlinked wiki pages.
.obsidian/. Otherwise, ask.CLAUDE.md. This is the schema — it defines page types, frontmatter, naming conventions, and folder paths. Follow it exactly. If no CLAUDE.md exists, this vault is not an LLM Wiki vault — tell the user and stop..obsidian/community-plugins.json to know what plugins are available (Dataview, Templater, Excalidraw, etc.).If the user provides arguments ($ARGUMENTS), interpret them as:
raw/all to process every unread itemAlways start here. Before scanning raw/, use AskUserQuestion to ask the user:
Question 1 — "Do you have new material to add before processing?" with options:
raw/Question 2 (if they chose "Yes" or "Fetch") — "What type of source are you adding?" with options:
raw/ with frontmatter (source, clipped, status: unread)gh CLI to fetch README, issue body, or PR description. Save to raw/raw/ with frontmatterraw/After capturing, ask: "Want to add more sources, or start processing?" Loop until they say process.
| Source type | How to capture | Frontmatter fields |
|---|---|---|
| Web article / blog post | WebFetch the URL, extract content | source: <url>, author, clipped, status: unread |
| GitHub repository | gh repo view <repo> --json name,description,url + fetch README | source: <repo-url>, type: repo, clipped, status: unread |
| GitHub issue | gh issue view <url> --json title,body,author,labels,comments | source: <issue-url>, author, clipped, status: unread |
| GitHub PR | gh pr view <url> --json title,body,author,files,comments | source: <pr-url>, author, clipped, status: unread |
| arXiv paper | WebFetch the abstract page, or use /read-arxiv if available | source: <arxiv-url>, author, published, clipped, status: unread |
| YouTube video | WebFetch the page, extract title/description/transcript if accessible | source: <yt-url>, author, published, clipped, status: unread |
| Tweet / X post | WebFetch the URL | source: <tweet-url>, author, clipped, status: unread |
| Hacker News thread | WebFetch the URL, capture top comments | source: <hn-url>, clipped, status: unread |
| Reddit post | WebFetch the URL | source: <reddit-url>, author, clipped, status: unread |
| Documentation page | WebFetch the URL | source: <url>, clipped, status: unread |
| Local file (PDF, .md, .txt) | Read tool to read the file | source: <filepath>, clipped, status: unread |
| Pasted text / notes | User pastes directly in chat | source: manual, clipped, status: unread |
| Linear / Jira ticket | WebFetch or CLI if available | source: <ticket-url>, clipped, status: unread |
| Slack thread | User pastes the content | source: slack, clipped, status: unread |
| Book excerpt / quote | User pastes with attribution | source: manual, author, clipped, status: unread |
| API docs / OpenAPI spec | WebFetch the URL | source: <url>, clipped, status: unread |
| Conference talk / slides | WebFetch or user pastes notes | source: <url>, author, clipped, status: unread |
| Podcast transcript | WebFetch or user pastes | source: <url>, author, published, clipped, status: unread |
Filename for raw files: use <slugified-title>.md. If no title, use <domain>-<date>.md or manual-<date>.md.
Scan the raw sources folder (typically raw/) for files with status: unread in frontmatter, or files with no status field (treat as unread). Use Grep to find them:
Grep pattern: "status: unread" in raw/
Glob pattern: raw/*.md
Also check for files with no status field — any .md file in raw/ without a status property is unread.
List all unread items to the user showing: filename, title, source URL, clipped date. Then proceed to process them.
For each source file being ingested:
Create a wiki source page following the vault's schema. Typical path: wiki/sources/<slug>.md
The source page must include:
title, type: source, created, updated, sources, tags)[[entities/person-name]], [[concepts/idea-name]]Naming: kebab-case slug derived from the title.
For each significant entity (person, tool, project, organization) mentioned in the source:
wiki/entities/ using Grep or Globsources list, add any new facts, add the source wikilinkOnly create entity pages for significant entities — not every name or tool mentioned in passing. Use judgment.
For each significant concept, pattern, or idea discussed:
wiki/concepts/sources listFor each ingested source, create a visual relationship diagram in Drawings/:
Drawings/map-<source-slug>.excalidraw.md[[wikilink]] to its wiki page so clicking navigates thereexcalidraw-plugin: parsed, tags: [excalidraw, map], excalidraw-autoexport: svg![[Drawings/map-<source-slug>.excalidraw]]If this is the 2nd+ source ingested, also update or create Drawings/wiki-map.excalidraw.md — a master graph showing all sources, entities, and concepts with their connections. This grows with each ingest.
If the source connects, contradicts, or extends existing wiki content in a meaningful way, create a synthesis page in wiki/synthesis/. Synthesis pages should:
Only create synthesis pages when there's genuine cross-source insight. Don't force it.
Read wiki/index.md and verify it will surface the new pages. If index.md uses Dataview queries (recommended), no manual update is needed — the queries auto-update. If it's a manual list, add entries for each new page.
Append an entry to wiki/log.md:
## YYYY-MM-DD
- **Ingested**: "Source Title" (`raw/filename.md`)
- **Created**: [[sources/slug]], [[entities/person]], [[concepts/idea]]
- **Updated**: [[entities/existing-entity]] (added new source)
Edit the raw file's frontmatter to change status: unread to status: done. Do not modify any other content in the raw file — raw sources are immutable except for the status field.
Show the user a summary:
raw/(from [[sources/slug]])[[sources/karpathy-llm-wiki]], not [[karpathy-llm-wiki]][[entities/andrej-karpathy|Karpathy]]sources: ["[[sources/slug]]"] or use plain slugs per the vault schema(from [[sources/slug]])YYYY-MM-DDvannevar-bush.md, not Vannevar Bush.mdEvery installed plugin exists for a reason. Use them aggressively during ingest.
Include Dataview query blocks in every wiki page where they add value:
LIST FROM "wiki" WHERE contains(file.outlinks, this.file.link)After processing each source, automatically create an Excalidraw relationship diagram in Drawings/ that maps:
Use the ExcalidrawAutomate API pattern:
// In a code block or via script — conceptual structure:
// 1. Central node: the source title
// 2. Entity nodes branching out (color-coded by type: person, tool, project, org)
// 3. Concept nodes branching out (different color)
// 4. Arrows labeled with the relationship
// 5. Each node text contains a [[wikilink]] to its page
Create the .excalidraw.md file directly with the proper structure:
---
excalidraw-plugin: parsed