any input (code, docs, papers, images) - knowledge graph - clustered communities - HTML + JSON + audit report
Turn any folder of files into a navigable knowledge graph with community detection, an honest audit trail, and three outputs: interactive HTML, GraphRAG-ready JSON, and a plain-language GRAPH_REPORT.md.
/graphify # full pipeline on current directory → Obsidian vault
/graphify <path> # full pipeline on specific path
/graphify <path> --mode deep # thorough extraction, richer INFERRED edges
/graphify <path> --update # incremental - re-extract only new/changed files
/graphify <path> --directed # build directed graph (preserves edge direction: source→target)
/graphify <path> --whisper-model medium # use a larger Whisper model for better transcription accuracy
/graphify <path> --cluster-only # rerun clustering on existing graph
/graphify <path> --no-viz # skip visualization, just report + JSON
/graphify <path> --html # (HTML is generated by default - this flag is a no-op)
/graphify <path> --svg # also export graph.svg (embeds in Notion, GitHub)
/graphify <path> --graphml # export graph.graphml (Gephi, yEd)
/graphify <path> --neo4j # generate graphify-out/cypher.txt for Neo4j
/graphify <path> --neo4j-push bolt://localhost:7687 # push directly to Neo4j
/graphify <path> --mcp # start MCP stdio server for agent access
/graphify <path> --watch # watch folder, auto-rebuild on code changes (no LLM needed)
/graphify <path> --wiki # build agent-crawlable wiki (index.md + one article per community)
/graphify <path> --obsidian --obsidian-dir ~/vaults/my-project # write vault to custom path (e.g. existing vault)
/graphify add <url> # fetch URL, save to ./raw, update graph
/graphify add <url> --author "Name" # tag who wrote it
/graphify add <url> --contributor "Name" # tag who added it to the corpus
/graphify query "<question>" # BFS traversal - broad context
/graphify query "<question>" --dfs # DFS - trace a specific path
/graphify query "<question>" --budget 1500 # cap answer at N tokens
/graphify path "AuthModule" "Database" # shortest path between two concepts
/graphify explain "SwinTransformer" # plain-language explanation of a node
graphify is built around Andrej Karpathy's /raw folder workflow: drop anything into a folder - papers, tweets, screenshots, code, notes - and get a structured knowledge graph that shows you what you didn't know was connected.
Three things it does that Claude alone cannot:
graphify-out/graph.json and survive across sessions. Ask questions weeks later without re-reading everything.Use it for:
If no path was given, use . (current directory). Do not ask the user for a path.
Follow these steps in order. Do not skip steps.
# Detect the correct Python interpreter (handles pipx, venv, system installs)
GRAPHIFY_BIN=$(which graphify 2>/dev/null)
if [ -n "$GRAPHIFY_BIN" ]; then
PYTHON=$(head -1 "$GRAPHIFY_BIN" | tr -d '#!')
case "$PYTHON" in
*[!a-zA-Z0-9/_.-]*) PYTHON="python3" ;;
esac
else
PYTHON="python3"
fi
"$PYTHON" -c "import graphify" 2>/dev/null || "$PYTHON" -m pip install graphifyy -q 2>/dev/null || "$PYTHON" -m pip install graphifyy -q --break-system-packages 2>&1 | tail -3
# Write interpreter path for all subsequent steps (persists across invocations)
mkdir -p graphify-out
"$PYTHON" -c "import sys; open('graphify-out/.graphify_python', 'w').write(sys.executable)"
If the import succeeds, print nothing and move straight to Step 2.
In every subsequent bash block, replace python3 with $(cat graphify-out/.graphify_python) to use the correct interpreter.
$(cat graphify-out/.graphify_python) -c "
import json
from graphify.detect import detect
from pathlib import Path
result = detect(Path('INPUT_PATH'))
print(json.dumps(result))
" > graphify-out/.graphify_detect.json
Replace INPUT_PATH with the actual path the user provided. Do NOT cat or print the JSON - read it silently and present a clean summary instead:
Corpus: X files · ~Y words
code: N files (.py .ts .go ...)
docs: N files (.md .txt ...)
papers: N files (.pdf ...)
images: N files
video: N files (.mp4 .mp3 ...)
Omit any category with 0 files from the summary.
Then act on it:
total_files is 0: stop with "No supported files found in [path]."skipped_sensitive is non-empty: mention file count skipped, not the file names.total_words > 2,000,000 OR total_files > 200: show the warning and the top 5 subdirectories by file count, then ask which subfolder to run on. Wait for the user's answer before proceeding.Skip this step entirely if detect returned zero video files.
Video and audio files cannot be read directly. Transcribe them to text first, then treat the transcripts as doc files in Step 3.
Strategy: Read the god nodes from graphify-out/.graphify_detect.json (or the analysis file if it exists from a previous run). You are already a language model — write a one-sentence domain hint yourself from those labels. Then pass it to Whisper as the initial prompt. No separate API call needed.
However, if the corpus has only video files and no other docs/code, use the generic fallback prompt: "Use proper punctuation and paragraph breaks."
Step 1 - Write the Whisper prompt yourself.
Read the top god node labels from detect output or analysis, then compose a short domain hint sentence, for example:
transformer, attention, encoder, decoder → "Machine learning research on transformer architectures and attention mechanisms. Use proper punctuation and paragraph breaks."kubernetes, deployment, pod, helm → "DevOps discussion about Kubernetes deployments and Helm charts. Use proper punctuation and paragraph breaks."Set it as WHISPER_PROMPT to use in the next command.
Step 2 - Transcribe:
GRAPHIFY_WHISPER_MODEL=base # or whatever --whisper-model the user passed
$(cat graphify-out/.graphify_python) -c "
import json, os
from pathlib import Path
from graphify.transcribe import transcribe_all
detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text())
video_files = detect.get('files', {}).get('video', [])
prompt = os.environ.get('GRAPHIFY_WHISPER_PROMPT', 'Use proper punctuation and paragraph breaks.')
transcript_paths = transcribe_all(video_files, initial_prompt=prompt)
print(json.dumps(transcript_paths))
" > graphify-out/.graphify_transcripts.json
After transcription:
graphify-out/.graphify_transcripts.jsonTranscribed N video file(s) -> treating as docsWhisper model: Default is base. If the user passed --whisper-model <name>, set GRAPHIFY_WHISPER_MODEL=<name> in the environment before running the command above.
Before starting: note whether --mode deep was given. You must pass DEEP_MODE=true to every subagent in Step B2 if it was. Track this from the original invocation - do not lose it.
This step has two parts: structural extraction (deterministic, free) and semantic extraction (Claude, costs tokens).
Run Part A (AST) and Part B (semantic) in parallel. Dispatch all semantic subagents AND start AST extraction in the same message. Both can run simultaneously since they operate on different file types. Merge results in Part C as before.
Note: Parallelizing AST + semantic saves 5-15s on large corpora. AST is deterministic and fast; start it while subagents are processing docs/papers.
For any code files detected, run AST extraction in parallel with Part B subagents:
$(cat graphify-out/.graphify_python) -c "
import sys, json
from graphify.extract import collect_files, extract
from pathlib import Path
import json
code_files = []
detect = json.loads(Path('graphify-out/.graphify_detect.json').read_text())
for f in detect.get('files', {}).get('code', []):
code_files.extend(collect_files(Path(f)) if Path(f).is_dir() else [Path(f)])
if code_files:
result = extract(code_files, cache_root=Path('.'))
Path('graphify-out/.graphify_ast.json').write_text(json.dumps(result, indent=2))
print(f'AST: {len(result[\"nodes\"])} nodes, {len(result[\"edges\"])} edges')