Query the wiki as a graph without maintaining a graph DB. Structured frontmatter queries, neighbor traversal, shortest paths, hub/orphan detection — each call scans the .md files fresh and computes the answer on the fly. Defaults to active-tier pages when memory tiers are enabled.
Structured graph queries over the wiki — without a persistent graph store.
The wiki already has a knowledge graph: every [[wikilink]] is an edge, every
page is a node, every frontmatter field is a property. This skill just computes
over that graph on demand. Each invocation scans the markdown files, builds an
in-memory graph, runs the query, prints the answer, and exits. Nothing is
cached — the files are the source of truth, always.
Karpathy's wiki treats the filesystem as the database.
wiki-graphtreats the filesystem as the graph.
type: model pages tagged
transformer updated since March"[[claude-3]]?"[[gpt-4]] and
[[llama-3]] — do any concepts bridge them?"wiki-search is for "find pages about X" (ranked text relevance).
wiki-graph is for "pages where property P holds" and "pages connected
to X" (structural queries).
Orientation is not required — this skill runs cold against the file layout. It reads SCHEMA.md only to know which frontmatter fields are reserved types vs. free-form strings.
read_file {wiki_path}/SCHEMA.md (optional — for type field names)
① Parse arguments. Exactly one of --query, --neighbors,
--shortest-path, --hubs, --orphans, --cluster should be set
(they are different query modes). --format and --limit apply to all
modes.
② Scan the wiki (shared by all modes):
Walk every .md file outside raw/. For each file extract:
[[wikilinks]] in the body — the outbound edgesmemory_tiers: pick the driving field,
compare age to thresholds, apply any tier_override: pinTier filtering depends on mode:
--query, --hubs, --orphans, --cluster — filter nodes by --tier
(default active when tiers are enabled, else all). These are
ambient-state queries and should reflect the "focus" surface.--neighbors, --shortest-path — include all tiers by default. The
user named a specific seed, so connectivity matters more than surface
focus. Each result annotates its tier ([active]/[archived]/[frozen])
so the user can eyeball where archived bridge concepts live. Pass
--tier=active explicitly to restrict.Build three in-memory structures:
nodes: {id → {properties}}
out_edges: {id → [target_id, ...]}
in_edges: {id → [source_id, ...]} # inverted while walking
Resolve [[wikilinks]] to node ids by matching the link text against page
titles (frontmatter title) first, then filename stems. Unresolved links
become dangling edges — reported in the --orphans / wiki-lint modes but
not treated as nodes.
③ Dispatch to the query mode.
--query "<expression>"Dataview-style filter over frontmatter. Supports:
type: entity, tags contains transformerupdated > 2025-01, sources >= 3AND, OR, NOT--query "type: entity AND tags contains model AND updated > 2025-01"
--query "type: comparison OR (type: concept AND tags contains attention)"
--query "NOT (tags contains stub) AND sources >= 3"
Evaluation: iterate nodes, evaluate expression against each node's properties,
collect matches. Sort by updated descending by default (override with
--sort=<field>).
--neighbors <page> [--depth=N]BFS from the starting page over out_edges ∪ in_edges (undirected traversal —
a wiki link is a bidirectional relation in practice, even if stored one-way).
Default depth is 1; cap at 4 to keep output readable.
Return the layered neighborhood:
depth 0: [[claude-3]] (seed)
depth 1: [[anthropic]], [[constitutional-ai]], [[rlhf]]
depth 2: [[alignment]], [[dario-amodei]], ...
Each neighbor line also carries its type and top 2 tags so the user can scan
for the ones that matter.
--shortest-path <a>,<b>Undirected BFS from a to b. If no path exists, say so and suggest the
closest candidates (smallest BFS frontier intersection). If multiple shortest
paths of the same length exist, return up to 3.
Output the path with edge labels (frontmatter type of each node):
[[claude-3]] (entity) → [[rlhf]] (concept) → [[instruct-gpt]] (entity) → [[gpt-4]] (entity)
This is often the most useful mode — it surfaces bridge concepts between two entities the user didn't realize were connected.
--hubs [--limit=20]Sort all nodes by |in_edges| + 0.5·|out_edges| descending (inbound links
matter more — being linked to is a stronger signal than linking out). Return
the top --limit with their link counts.
--orphansTwo classes:
Report both separately. True orphans are almost always bugs (the ingest forgot to cross-reference). Leaves are sometimes intentional (a stub page waiting to be fleshed out).
--cluster=<tag>Collect all nodes carrying <tag> in frontmatter. Report:
intra_edges / (N·(N-1)/2) — high density means the cluster is
self-referential; low density means it's a loose groupingOffers a quick way to ask "is this tag actually a coherent topic, or just a shelf?"
④ Format output.
--format=text (default) — human-readable, excerpt above--format=json — machine-readable, for piping to other tools (e.g. jq)--format=mermaid — a graph TD block the user can paste into Obsidian or
GitHub markdown. Best with --neighbors, --shortest-path, or --cluster.Mermaid example for --neighbors claude-3 --depth=2 --format=mermaid:
graph TD
claude-3[claude-3] --> anthropic
claude-3 --> constitutional-ai
claude-3 --> rlhf
constitutional-ai --> alignment
rlhf --> instruct-gpt
anthropic --> dario-amodei
⑤ Suggested next actions. Based on what the query found:
wiki-lint --fix to surface cross-reference candidateswiki-query --file "synthesis of {tag}"wiki-query to write that
connection up as a comparison page[Operation] wiki-graph | {mode}: {argument}
[Scanned] {N} pages, {M} edges ({K} dangling)
[Results]
{mode-specific body}
[Summary]
{1–2 sentence interpretation of the result}
[Suggested next]
→ {next skill invocation}
A persistent graph store (Neo4j, an index file, a pickled networkx) would be faster for very large wikis, but it introduces a second source of truth. The moment the agent updates a page and forgets to update the graph, queries silently lie.
Karpathy's design keeps the filesystem as the only source of truth. Every
query is computed from the current state of the files. wiki-graph honors that