This skill should be used when the user asks to "extract knowledge", "condense this text", "summarize for agents", "create a knowledge file", "compress this document", "extract principles", "distill this", or provides text/files to be converted into token-efficient knowledge representations for LLM agent consumption.
Extract, condense, and structure knowledge from any text source into token-efficient representations optimized for LLM agent consumption. Output includes machine-readable metadata for discoverability and a compressed body preserving all distinct ideas.
When to Use
Extracting reusable knowledge from articles, reports, conversations, documentation
Creating agent-readable knowledge files from verbose source material
Building a knowledge base where token budget matters
Converting human-oriented docs into agent-oriented references
Core Process
Phase 1: Intake & Schema Selection
Identify source. Accept text inline, file path, or URL. Record source identifier for metadata.
Classify domain. Determine primary domain and tags (see tag taxonomy in references/output-format.md).
Choose output schema. Default to the Core schema from references/output-format.md. Use a preset only when it clearly improves compression, scanability, or reduces ambiguity relative to plain Core for this source.
Related Skills
Map information roles before compressing. Separate the source into:
Compression is allowed to delete low-value narrative/setup, not the other four categories. If a detail changes meaning, applicability, confidence, or reconstructability, it is not "mere context."
Most items are possibilities, constraints, or unknowns
Risks & Open Questions
Analytical
Research, evaluations, comparisons, postmortems
Most items are findings backed by evidence
Limitations & Open Questions
Decisional
Meeting notes, ADRs, tradeoff records
Most items are choices plus reasons
Revisit Conditions
Narrative
Case studies, interviews, experience reports
Story sequence materially matters
Key Takeaways
Referential
Specs, APIs, schemas, catalogs
Structured tables/key-value are more efficient than tagged sentences
Gotchas
If no preset clearly dominates, stay in the Core schema. Mixed documents should usually remain Core rather than forcing a preset. When in doubt, omit the preset.
See references/output-format.md for the Core schema, optional presets, and examples.
Phase 2: Extract & Compress
Apply these rules strictly, in order:
Assign each item a Core role unless the Referential preset makes tags unnecessary. Use only the roles actually needed:
K key claim
S support: rationale, mechanism, process, dependency, chronology
X applicability or qualifier: assumptions, caveats, constraints, exceptions, scope
E evidence or example
Q open question, unresolved issue, or explicit unknown
A agent-specific note, only when behavior genuinely changes for agents
Maximize compression ratio. Cut every word without information content. One item = one tight sentence, two max. If it needs three, it is not condensed enough.
Group by theme, not source order. Surface the underlying knowledge structure, not the document structure.
Preserve all distinct ideas. Fewer words, not fewer ideas. Nothing silently dropped — merge redundancies explicitly into broader items.
Preserve source faithfulness. Do not strengthen, universalize, or clean up the source's uncertainty:
keep modality: "can", "may", "often", "suggests", "appears", "in this case", "so far"
keep applicability: if the source limits who, when, where, or under what conditions a claim holds, preserve that
keep support structure when it explains the claim: cause -> effect, problem -> response, evidence -> conclusion, premise -> decision
keep unresolved uncertainty when it bounds confidence or actionability
do not convert descriptive observations into universal prescriptions unless the source clearly does so
Apply symbolic compression. Use shorthand, symbols, and structured formats where they save tokens without losing clarity:
Tables over prose for comparisons/mappings
→ for implications/consequences
∴ for conclusions
Nested bullets over paragraphs
Key-value pairs over sentences
Compress examples selectively, not reflexively. Drop examples only if they are pure illustration. Keep at least one compressed example or evidence anchor when it does one of these:
substantiates the claim rather than merely decorating it
calibrates magnitude, confidence, frequency, or scale
specifies a mechanism, procedure, exception, or edge case the abstract statement would hide
narrows applicability or clarifies who/when the claim is for
Close with a flat checklist section. Use the preset-specific closing title when a preset is chosen; otherwise default to Notes, Risks, or Open Questions, whichever best matches the source.
Phase 3: Agent Augmentation
Review extracted items through the agent lens. Not every item needs an agent note — add only where behavior genuinely differs. For relevant items, ask:
Does this change when the reader is an agent? → Add agent note inline
Silent failure mode for agents? → Call it out (agents don't infer conventions or accumulate cross-session context)
Mechanical enforcement available? → Note where rule is enforced (linter, CI, test)
Explicit injection point needed? → Agents need exact file paths, naming conventions, placement rules
Should agents maintain this knowledge? → Flag if knowledge could go stale and agent should update it during workflow
For ideation- or narrative-heavy sources, agent augmentation is often minimal — focus it on actionability (what would an agent need to do differently?) rather than repeating the source.
Phase 4: Faithfulness Audit
Before formatting the final output, run this short audit:
Reconstructability check. Can a fresh reader recover the source's main claims, support structure, and material limits from the condensed file alone?
Applicability check. Did any explicit assumption, exception, scope limiter, audience qualifier, or time bound disappear?
Support-structure check. If the source justified a claim via reasoning, evidence, process, dependency, chronology, or causality, is that structure still present?
Evidence check. Is at least one anchor example or evidence signal preserved where otherwise the output would become an unsupported slogan?
Claim-type check. Did any local observation, tentative finding, proposal, or hypothesis become a settled rule by accident?
Narrative trim check. If something was dropped, was it actually setup or repetition rather than a support, applicability, or confidence signal?
If any answer is "no," revise before saving.
Phase 5: Metadata & Output
Generate complete output with frontmatter metadata header and compressed body. Follow the exact format specification in references/output-format.md.
The frontmatter serves as a "card catalog entry" — other agents read only this block to decide whether to load the full document. Optimize it for stable retrieval metadata, not derived statistics.
When the source contains strong scope or generalization limits, include applicability in frontmatter (see references/output-format.md).
Only include preset in frontmatter when a named preset materially shaped the output and is useful to a future reader.
Do not include density or compression_ratio in normal outputs. Those are eval metrics, not default metadata.
Phase 6: Save
Save the condensed output to a file automatically after generation.