Parse ebooks, extract concepts and entities with citation traceability, classify by type/layer, and synthesize across book collections.
You analyze ebooks to extract knowledge with full citation traceability. This skill supports two complementary extraction modes:
Every extraction must be traceable to its exact source. Citation traceability is non-negotiable. Extract less with full provenance rather than more without it.
For extracting IDEAS organized by abstraction level.
Use when: Analyzing a book for transferable ideas, building a concept taxonomy, understanding how abstract principles relate to concrete tactics.
Output: JSON files (analysis.json, concepts.json)
Example: "Spaced repetition improves retention" is a MECHANISM at Layer 2.
For extracting NAMED THINGS that can be cross-referenced across books.
Use when: Building a knowledge base where the same study, researcher, or framework appears in multiple books. The goal is entity resolution—recognizing that "Hogarth's framework" in Range is the same as "kind/wicked environments" mentioned elsewhere.
Output: Markdown files in knowledge base structure
Example: "Kind vs Wicked Environments" is a FRAMEWORK by Robin Hogarth.
| If you want to... | Use Mode |
|---|---|
| Understand a book's argument structure | Concept Extraction |
| Build a reference library across books | Entity Extraction |
| Create actionable takeaways | Concept Extraction |
| Track what researchers say across sources | Entity Extraction |
| Both | Run both modes sequentially |
| Type | What It Captures | Example |
|---|---|---|
| study | Research findings, experiments, data | Flynn Effect, Marshmallow Test |
| researcher | People and their contributions | Anders Ericsson, Robin Hogarth |
| framework | Mental models, taxonomies, systems | Kind vs Wicked, Desirable Difficulties |
| anecdote | Stories used to illustrate points | Tiger vs Roger, Challenger Disaster |
| concept | Ideas that aren't frameworks | Cognitive entrenchment, Match quality |
Some entities don't fit cleanly into the five types. Guidelines:
| Entity Kind | Use Type | Rationale |
|---|---|---|
| Simulations/Games (Superstruct, EVOKE) | anecdote | Illustrative events, even if hypothetical |
| Institutions (IFTF, WEF) | researcher | Organizations contribute ideas like individuals |
| Historical events (Challenger disaster) | anecdote | Stories that illustrate principles |
| Hypothetical scenarios | anecdote | Future scenarios from books like Imaginable |
| Thought experiments | framework | If systematic; otherwise concept |
When uncertain: Default to anecdote for narratives/events, concept for ideas, framework for systematic methods.
When the book's author is also a significant entity (e.g., Jane McGonigal in Imaginable):
Create a researcher entity if:
Skip if:
Template addition for author-subjects:
## Note
This researcher is the author of [Book] in our collection. Their frameworks and concepts are documented separately.
# [Entity Name]
**Type:** study | researcher | framework | anecdote | concept
**Status:** stub | partial | solid | authoritative
**Last Updated:** YYYY-MM-DD
**Aliases:** alias1, alias2, alias3
## Summary
[2-3 sentence synthesized understanding]
## Key Findings / What It Illustrates
1. [Claim or finding with source]
— Source: [Book], Ch.[X]
2. [Another claim]
— Source: [Book], Ch.[X]
## Key Quotes
> "Quotable text here."
> "Another memorable quote."
## Sources in Collection
| Book | Author | How It's Used | Citation |
|------|--------|---------------|----------|
| Range | Epstein | [Role in book] | Ch.X |
## Sources NOT in Collection
- [Book that would enrich this entity]
## Related Entities
- [Other Entity](../type/other-entity.md) - Relationship description
## Open Questions
- [What we don't yet know]
/knowledge/
├── _index.md # Master registry
├── _entities.json # Searchable index (generated)
│
├── nonfiction/
│ ├── _index.md # Domain index
│ ├── _[book]-quotes.md # Book-specific quotes file
│ ├── studies/
│ │ ├── flynn-effect.md
│ │ └── chase-simon-chunking.md
│ ├── researchers/
│ │ ├── hogarth-robin.md
│ │ └── tetlock-philip.md
│ ├── frameworks/
│ │ ├── kind-vs-wicked-environments.md
│ │ └── desirable-difficulties.md
│ ├── anecdotes/
│ │ ├── tiger-vs-roger.md
│ │ └── challenger-disaster.md
│ └── concepts/
│ ├── cognitive-entrenchment.md
│ └── match-quality.md
│
├── cooking/ # Domain-specific structure
│ ├── techniques/
│ ├── ingredients/
│ └── equipment/
│
└── technical/
├── patterns/
└── technologies/
Quotable quotes are a distinct extraction type. For each book, create a quotes file:
File: _[book-slug]-quotes.md
Structure:
# Quotable Quotes from [Book Title]
**Author:** [Author]
**Last Updated:** YYYY-MM-DD
## On [Theme 1]
> "Quote text here."
> "Another quote on same theme."
## On [Theme 2]
> "Quote on different theme."
What makes a good quote:
kb-resolve-entity.ts to see if entity already existskb-generate-index.ts| State | Symptoms | Intervention |
|---|---|---|
| KB0 | No knowledge base | Create directory structure |
| KB1 | Structure exists, no entities | Begin extraction |
| KB2 | Extracting from book | Create entity files |
| KB3 | Entities created, not linked | Add Related Entities |
| KB4 | Linked, no index | Run kb-generate-index.ts |
| KB5 | Complete for this book | Proceed to next book |
Triggered when: 2+ books have been extracted to the knowledge base.
Goals:
Process:
Entity overlap detection
# Find entities with 2+ sources
grep -l "Sources in Collection" knowledge/nonfiction/**/*.md | \
xargs grep -l "| .* | .* |" | head -20
Or manually review entities updated with new source.
Conceptual connection mapping
Synthesis documentation For entities appearing in 2+ books, update the Summary section:
## Summary
[Synthesized understanding from BOTH sources, noting agreements and differences]
Cross-book insights
Document thematic connections in context/insights/cross-book-{theme}.md:
# Cross-Book Insight: [Theme]
## Books Contributing
- Range (Epstein) - [perspective]
- Imaginable (McGonigal) - [perspective]
## Synthesis
[How the books complement or contradict each other]
| Type | Definition | Example |
|---|---|---|
| Principle | Foundational truth or axiom | "Communities form around shared identity" |
| Mechanism | How something works | "Reciprocity creates social bonds" |
| Pattern | Recurring structure or approach | "The community lifecycle pattern" |
| Strategy | High-level approach to achieve goals | "Build trust before asking for contribution" |
| Tactic | Specific actionable technique | "Send welcome emails within 24 hours" |
| Layer | Name | Abstraction | Example |
|---|---|---|---|
| 0 | Foundational | Universal principles | "Humans seek belonging" |
| 1 | Theoretical | Domain-specific theory | "Community requires shared purpose" |
| 2 | Strategic | Approaches and frameworks | "The funnel model of engagement" |
| 3 | Tactical | Specific methods | "Onboarding sequences" |
| 4 | Specific | Concrete implementations | "Use Discourse for forums" |
| Relationship | Meaning | When to Use |
|---|---|---|
| INFLUENCES | A affects B | Causal or correlational connection |
| SUPPORTS | A provides evidence for B | Citation, example, validation |
| CONTRADICTS | A conflicts with B | Opposing claims |
| COMPOSED_OF | A contains B | Part-whole relationships |
| DERIVES_FROM | A is derived from B | Logical conclusions |
| State | Symptoms | Intervention |
|---|---|---|
| EA0 | No input file | Guide file preparation |
| EA1 | Raw file, not parsed | Run ea-parse.ts |
| EA2 | Parsed, not extracted | LLM extracts concepts |
| EA3 | Extracted, not classified | Assign types and layers |
| EA4 | Classified, not annotated | Add themes, relationships |
| EA5 | Single book complete | Export or proceed to synthesis |
| EA6 | Multi-book ready | Cross-book synthesis |
| EA7 | Analysis complete | Generate reports |
ea-parse.ts to chunk book with position trackingParse ebook files into chunks with metadata and position tracking.
deno run --allow-read scripts/ea-parse.ts path/to/book.txt
deno run --allow-read scripts/ea-parse.ts path/to/book.epub --format epub
deno run --allow-read scripts/ea-parse.ts book.txt --chunk-size 1500 --overlap 150
Output: JSON with metadata, chapters (if detected), and chunks with positions.
Scan knowledge base and generate searchable entity index.
deno run --allow-read --allow-write scripts/kb-generate-index.ts /path/to/knowledge
Output: Creates _entities.json with all entities, aliases, and metadata.
Search for existing entities before creating duplicates.
deno run --allow-read scripts/kb-resolve-entity.ts "Flynn Effect"
deno run --allow-read scripts/kb-resolve-entity.ts "Hogarth" --threshold 0.5
deno run --allow-read scripts/kb-resolve-entity.ts "kind learning" --json
Options:
--threshold <0-1> - Minimum match score (default: 0.3)--limit <n> - Maximum results (default: 5)--json - Output as JSONValidate analysis output for citation accuracy and schema completeness.
deno run --allow-read scripts/ea-validate.ts analysis.json --report
Pattern: Extracting every potentially interesting phrase. Fix: Ask "Would I cite this?" before extracting. Quality over quantity.
Pattern: Extracting without preserving exact quotes or positions. Fix: Always capture: exact quote, chapter reference, context.
Pattern: Creating new entity without checking if it exists.
Fix: Always run kb-resolve-entity.ts first.
Pattern: Entities without Related Entities links. Fix: Every entity should connect to at least 2 others.
Pattern: Entity captures ideas but no memorable phrasing. Fix: Include Key Quotes section with author's exact words.
Pattern: Analyzing books without cross-referencing. Fix: After 2+ books, run synthesis to find connections.
1. Scan book chapter by chapter
2. Identify all named studies, researchers, frameworks, anecdotes
3. Create inventory document listing all potential entities
4. For each entity:
a. kb-resolve-entity.ts "[entity name]" to check existence
b. Create markdown file in appropriate type directory
c. Fill in template with findings and citations
d. Add Key Quotes section
5. Create _range-quotes.md with all memorable quotes
6. Update _index.md with new entities
7. kb-generate-index.ts to rebuild _entities.json
1. ea-parse.ts book.txt --chunk-size 2000
2. For each chunk, extract top 3-5 concepts
3. Classify by type and layer
4. Generate concepts.json and report.md
| File | Location |
|---|---|
| Entity files | knowledge/{domain}/{type}/{entity-slug}.md |
| Quotes file | knowledge/{domain}/_[book]-quotes.md |
| Entity index | knowledge/_entities.json |
| Domain index | knowledge/{domain}/_index.md |
| File | Location |
|---|---|
| Full analysis | ebook-analysis/{author}-{title}/analysis.json |
| Concepts only | ebook-analysis/{author}-{title}/concepts.json |
| Citations | ebook-analysis/{author}-{title}/citations.json |
| Report | ebook-analysis/{author}-{title}/report.md |
| Source | Leads to |
|---|---|
| research | Multi-book synthesis ready |
| reverse-outliner | Structural data for concept extraction |
| From State | Leads to |
|---|---|
| Entity extraction complete | dna-extraction (deep functional analysis) |
| Concept extraction complete | media-meta-analysis (cross-source synthesis) |
| Skill | Relationship |
|---|---|
| dna-extraction | 6-axis functional analysis for annotation |
| reverse-outliner | Structural approach for fiction |
| voice-analysis | Author style fingerprinting |
| context-network | Knowledge base maintenance |
| Book Type | Expected Entities | Estimated Effort |
|---|---|---|
| Dense non-fiction (Range, Thinking Fast & Slow) | 60-100 | 4-6 hours |
| Moderate non-fiction (most business books) | 30-50 | 2-3 hours |
| Light non-fiction (popular science) | 15-30 | 1-2 hours |
| Technical books | 20-40 | 2-3 hours |
Different non-fiction subtypes yield different entity profiles:
| Subtype | Example | Entity Profile | Expected Count |
|---|---|---|---|
| Research synthesis | Range | Many studies, researchers, frameworks | 60-100 |
| Methodological/How-to | Imaginable | Many frameworks, few studies | 30-50 |
| Memoir/Narrative | Educated | Few frameworks, many anecdotes | 20-40 |
| Reference | Technical manuals | Many concepts, few anecdotes | Variable |
Research synthesis books cite many studies and researchers, connecting ideas across domains. Methodological books teach techniques and frameworks but cite fewer external sources. Memoir/narrative books use personal stories to illustrate points rather than research.
Book classification metadata (Calibre tags, library categories) is often:
Always verify classification makes sense before extraction. A "fiction" tag on a methodology book like Imaginable is a metadata error.
Use extended thinking for:
Trigger phrases: "synthesize across books", "find contradictions", "identify gaps", "comprehensive analysis"