Extract structured knowledge from source material. Comprehensive extraction is the default — every insight that serves the domain gets extracted. For domain-relevant sources, skip rate must be below 10%. Zero extraction from a domain-relevant source is a BUG. Triggers on "/reduce", "/reduce [file]", "extract insights", "mine this", "process this".
Read these files to configure domain-specific behavior:
ops/derivation-manifest.md — vocabulary mapping, extraction categories, platform hints
vocabulary.notes for the notes folder namevocabulary.inbox for the inbox folder namevocabulary.note for the note type name in outputvocabulary.note_plural for the plural formvocabulary.reduce for the process verb in outputvocabulary.cmd_reflect for the next-phase command namevocabulary.cmd_reweave for the backward-pass command namevocabulary.cmd_verify for the verification command namevocabulary.extraction_categories for domain-specific extraction tablevocabulary.topic_map for MOC/topic map referencesvocabulary.topic_mapsops/config.yaml — processing depth, pipeline chaining, selectivity
processing.depth: deep | standard | quickprocessing.chaining: manual | suggested | automaticprocessing.extraction.selectivity: strict | moderate | permissiveops/queue/queue.json — current task queue (for handoff mode)
If these files don't exist (pre-init invocation or standalone use), use universal defaults:
notes/inbox/You are the extraction engine. Raw source material enters. Structured, atomic {vocabulary.note_plural} exit. Everything between is your judgment — and that judgment must err toward extraction, not rejection.
| Concept | What It Means | Example |
|---|---|---|
| Having knowledge | The vault contains information | "We store notes in folders" |
| Articulated reasoning | The vault explains WHY something works as a traversable {vocabulary.note} | "folder structure mirrors cognitive chunking because..." |
Having knowledge is not the same as articulating it. Even if information is embedded in the system, the vault may lack the externalized reasoning explaining WHY it works. That reasoning is what you extract.
For domain-relevant sources, COMPREHENSIVE EXTRACTION is the default. This means:
Extract ALL core {vocabulary.note_plural} — direct assertions about the domain that can stand alone as atomic propositions.
Extract ALL evidence and validations — if source confirms an approach, that confirmation IS the {vocabulary.note}. Evidence is extractable even when the conclusion is already known, because the reasoning path matters.
Extract ALL patterns and methods — techniques, workflows, practices. Named patterns are referenceable. Unnamed intuitions are not.
Extract ALL tensions — contradictions, trade-offs, conflicts. These are wisdom, not problems.
Extract ALL enrichments — if source adds detail to existing {vocabulary.note_plural}, create enrichment tasks. Near-duplicates almost always add value.
"We already know this" means we NEED the articulation, not that we should skip it.
"Would a future session benefit from this reasoning being a retrievable {vocabulary.note}?"
If YES -> extract to appropriate category If NO -> verify it is truly off-topic before skipping
For domain-relevant sources: skip rate < 10%. Zero extraction = BUG.
Target: $ARGUMENTS
Parse immediately:
--handoff: output RALPH HANDOFF block + task entries at endExecute these steps:
mcp__qmd__vector_search with query "[claim as sentence]", collection="{vocabulary.notes_collection}", limit=5qmd vsearch "[claim as sentence]" --collection {vocabulary.notes_collection} -n 5--handoff in target: create per-claim task files, update queue, output RALPH HANDOFF blockSTART NOW. Reference below explains methodology — use to guide, not as output.
When you encounter friction, surprises, methodology insights, process gaps, or contradictions — capture IMMEDIATELY:
| Observation | Action |
|---|---|
| Any observation | Create atomic note in ops/observations/ with prose-sentence title |
| Tension: content contradicts existing {vocabulary.note} | Create atomic note in ops/tensions/ with prose-sentence title |
The handoff Learnings section summarizes what you ALREADY logged during processing.
Extract composable {vocabulary.note_plural} from source material into {vocabulary.notes}/.
Extract the REASONING behind what works, not just observations about what works.
This is the extraction phase of the pipeline. You receive raw content and extract insights that serve the vault's domain. The mission is building externalized, retrievable reasoning — a graph of atomic propositions that can be traversed, connected, and built upon.
THE CORE DISTINCTION:
| Concept | Example | What to Extract |
|---|---|---|
| We DO this | "We tag notes with topics" | — (not sufficient) |
| We explain WHY | "topic tagging enables cross-domain navigation because..." | This |
The vault is not just an implementation. It is the articulated argument for WHY the implementation works.
THE EXTRACTION QUESTION:
If YES -> extract to appropriate category (even if "we already know this") If NO -> skip (RARE for domain-relevant sources — verify it is truly off-topic)
THE RULE: Implementation without articulation is incomplete. If we DO something but lack a {vocabulary.note} explaining WHY it works, that articulation needs extraction.
{DOMAIN:extraction_categories}
The structural invariant: Every domain's extraction has these universal categories regardless of domain:
| Category | What to Find | Output Type | Gate Required? |
|---|---|---|---|
| Core domain {vocabulary.note_plural} | Direct assertions about {vocabulary.domain} | {vocabulary.note} | NO |
| Patterns | Recurring structures across sources | {vocabulary.note} | NO |
| Comparisons | How different approaches compare, X vs Y, trade-offs | {vocabulary.note} | NO |
| Tensions | Contradictions, conflicts, unresolved trade-offs | tension note | NO |
| Anti-patterns | What breaks, what to avoid, failure modes | problem note | NO |
| Enrichments | Content that adds detail to existing {vocabulary.note_plural} | enrichment task | NO |
| Open questions | Unresolved questions worth tracking | {vocabulary.note} (open) | NO |
| Implementation ideas | Techniques, workflows, features to build | methodology note | NO |
| Validations | Evidence confirming an approach works | {vocabulary.note} | NO |
| Off-topic general content | Insight unrelated to {vocabulary.domain} | apply selectivity gate | YES |
IMPORTANT: Categories 1-9 bypass the selectivity gate. They extract directly to the appropriate output type. The selectivity gate exists ONLY for filtering off-topic content from general sources.
Hunt for these signals in every source:
Core domain signals:
Comparison signals:
Tension signals:
Anti-pattern signals:
Enrichment signals:
Implementation signals:
Validation signals:
For EVERY candidate, ask: "Does this serve {vocabulary.domain}?"
For domain-relevant sources: almost everything is YES. The gate barely applies. Skip rate < 10%.
CRITICAL: This gate exists to filter OUT content that does not serve {vocabulary.domain}. It applies ONLY to standard claims from GENERAL (off-topic) sources.
Do NOT use gate to reject:
For STANDARD claims from general sources, verify all four criteria pass:
The claim is understandable without source context. Someone reading this {vocabulary.note} cold can grasp what it argues without needing to know where it came from.
Fail: "the author's third point about methodology" Pass: "explicit structure beats implicit convention"
This {vocabulary.note} would be linked FROM elsewhere. {vocabulary.note_plural} function as APIs. If you cannot imagine writing since [[this claim]]... in another {vocabulary.note}, it is not composable.
Fail: a summary of someone's argument Pass: a claim you could invoke while building your own argument
Not already captured in the vault. Semantic duplicate check AND existing {vocabulary.note_plural} scan both clear.
Fail: semantically equivalent to an existing {vocabulary.note} Pass: genuinely new angle not yet articulated
Relates to existing thinking in the vault. Isolated insights that do not connect to anything are orphans. They rot.
Fail: interesting observation about unrelated domain Pass: extends, contradicts, or deepens existing {vocabulary.note_plural}
If ANY criterion fails: do not extract.
Before reading the source, understand what already exists:
# Get descriptions from existing notes
for f in {vocabulary.notes}/*.md; do
[[ -f "$f" ]] && echo "=== $(basename "$f" .md) ===" && rg "^description:" "$f" -A 0
done
Scan descriptions to understand current {vocabulary.note_plural}. This prevents duplicate extraction and helps identify connection points and enrichment opportunities.
Read the ENTIRE source. Understand what it contains, what it argues, what domain it serves.
Planning the extraction:
Explicit signal phrases to hunt:
Implicit signals (the best insights often hide in):
What you are hunting:
STOP. Before ANY filtering, determine the category of each candidate.
This is the critical step that prevents over-rejection. Categorize FIRST, then route to the appropriate extraction path.
| Category | How to Identify | Route To |
|---|---|---|
| Core domain {vocabulary.note} | Direct assertion about {vocabulary.domain} | -> {vocabulary.note} (SKIP selectivity gate) |
| Implementation idea | Describes a feature, tool, system, or workflow to build | -> methodology note (SKIP selectivity gate) |
| Tension/challenge | Describes a conflict, risk, or trade-off | -> tension note (SKIP selectivity gate) |
| Validation | Evidence confirming an approach works | -> {vocabulary.note} (SKIP selectivity gate) |
| Near-duplicate | Semantic search finds related vault {vocabulary.note} | -> evaluate for enrichment task |
| Off-topic claim | General insight not about {vocabulary.domain} | -> apply selectivity gate |
CRITICAL: Implementation ideas, tensions, validations, and domain {vocabulary.note_plural} do NOT need to pass the 4-criterion selectivity gate. The gate is for off-topic filtering ONLY.
Why this matters: The selectivity gate was designed for filtering general insights. But implementation ideas ("build a trails feature"), tensions ("optimization vs readability trade-off"), and validations ("research confirms our approach") are DIFFERENT output types that serve different purposes. Applying the selectivity gate to them is a category error.
For each candidate, run duplicate detection:
mcp__qmd__vector_search query="[proposed claim as sentence]" collection="{vocabulary.notes_collection}" limit=5
If MCP is unavailable, run:
qmd vsearch "[proposed claim as sentence]" --collection {vocabulary.notes_collection} -n 5
If qmd CLI is unavailable, fall back to keyword grep duplicate checks.
Why vector_search (vector semantic) instead of keyword search: Duplicate detection is where keyword search fails hardest. A claim about "friction in systems" will not find "resistance to change" via keyword matching even though they may be semantic duplicates. Vector search (~5s) catches same-concept-different-words duplicates that keyword search misses entirely. For a batch of 30-50 candidates, this adds ~3 minutes total — worth it to catch duplicates early rather than discovering them during {vocabulary.cmd_reflect}.
Scores are signals, not decisions. For ANY result with a relevant title or snippet:
The Enrichment Judgment (DEFAULT TO ENRICHMENT):
| Situation | Action |
|---|---|
| Exact text already exists | SKIP (truly identical — RARE) |
| Same claim, different words, source adds nothing | SKIP (verify by re-reading existing {vocabulary.note}) |
| Same claim, source has MORE detail/examples/framing | -> ENRICHMENT TASK (update existing {vocabulary.note}) |
| Same topic, DIFFERENT claim | -> EXTRACT as new {vocabulary.note}, flag for cross-linking |
| Related mechanism, different scope | -> EXTRACT as new {vocabulary.note}, flag for cross-linking |
DEFAULT TO ENRICHMENT. If source mentions the same topic, it almost certainly adds something. Truly identical content is RARE.
MANDATORY protocol when semantic search finds overlap:
Near-duplicates are opportunities, not rejections. Creating enrichment tasks is CORRECT behavior. If you are skipping near-duplicates without enrichment tasks, you are probably wrong.
Every extracted candidate gets classified:
Classification affects downstream handling but does NOT affect whether to extract. Both open and closed candidates get extracted.
Report what you found by category. Include counts:
Extraction scan complete.
SUMMARY:
- {vocabulary.note_plural}: N
- implementation ideas: N
- tensions: N
- enrichment tasks: N
- validations: N
- open questions: N
- skipped: N
- TOTAL OUTPUTS: N
---
CLAIMS ({vocabulary.note_plural}):
1. [claim as sentence] — connects to [[existing note]]
2. [claim as sentence] — extends [[existing note]]
...
IMPLEMENTATION IDEAS (methodology notes):
1. [feature/pattern] — what it enables, why it matters
...
TENSIONS (tension notes):
1. [X vs Y] — the conflict, why it matters
...
ENRICHMENT TASKS (update existing {vocabulary.note_plural}):
1. [[existing note]] — source adds [what is missing]
...
SKIPPED (truly nothing to add):
- [description] — why nothing extractable
Wait for user approval before creating files. Never auto-extract.
For each approved {vocabulary.note}:
a. Craft the title
The title IS the claim. Express the concept in exactly the words that capture it.
Test: "this {vocabulary.note} argues that [title]"
Good: "explicit structure beats implicit convention for agent navigation" Good: "small differences compound through repeated selection" Bad: "context management strategies" (topic label, not a claim)
b. Write the {vocabulary.note}
---