Use when converting technical books to skills with chapter detection, keyword extraction, and progressive disclosure.
Automated workflow for transforming OCR'd technical books into well-structured skills with chapter-based reference documents.
Use this skill when:
converting-epub-to-markdown first)| Phase | Purpose | Output |
|---|---|---|
| 1. Validation | Verify book file exists and is valid markdown | Book path confirmed |
| 2. TOC Analysis | Extract keywords from table of contents | Keyword list for description |
| 3. Chapter Detection |
| Identify chapter boundaries from headers |
| Chapter boundary markers |
| 4. Chunking | Split book into chapter files | references/chapters/chapter-XX.md |
| 5. Skill Generation | Create SKILL.md with chapter summaries | Main skill document |
| 6. Location | Determine skill category and placement | Skill directory path |
| 7. Verification | Test skill discovery with book topics | Skill loads on relevant queries |
Before starting conversion:
# Navigate to repository root
ROOT="$(git rev-parse --show-superproject-working-tree --show-toplevel | head -1)" && cd "$ROOT"
# Verify book file exists (user provides the path)
ls {path-to-book-file}.md
Input required from user:
For EPUB files: First use converting-epub-to-markdown skill to extract markdown:
Read(".claude/skill-library/claude/skill-management/converting-epub-to-markdown/SKILL.md")
Cannot proceed without valid markdown file ✅
Verify the book markdown is properly formatted:
# Check file size (OCR books are typically 10K-30K lines)
wc -l {path-to-book-file}.md
# Preview first 200 lines to identify structure
head -200 {path-to-book-file}.md
Look for:
## CHAPTER 8, ## 1 INFECTION VECTORS)Real examples from today's conversion:
## CHAPTER 8 System mechanisms (line 534)## 1 INFECTION VECTORS (line 640)## 1 EXAMINING PROCESSES (line 548)The TOC typically contains the best keywords for skill description:
# Find TOC section (usually marked with "Table of Contents" or "Contents")
grep -n -i "table of contents\|^contents$\|^## BRIEF CONTENTS" {path-to-book-file}.md
# Extract chapter titles from TOC (adjust line range based on grep results)
sed -n '100,300p' {path-to-book-file}.md | grep -i "chapter"
Manual keyword selection:
Real examples from today's conversion:
See: references/keyword-extraction-patterns.md for TOC parsing strategies.
Identify the markdown header pattern used for chapters:
# Search for common chapter patterns
grep -n "^##\s\+CHAPTER\|^##\s\+[0-9]\|^##\s\+[0-9][0-9]" {path-to-book-file}.md | head -20
Common patterns found in production:
## CHAPTER 8 System mechanisms (Windows Internals - H2 with "CHAPTER" keyword)## 1 INFECTION VECTORS (Mac Malware - H2 with digit and double space)## {digit} {TITLE} (most common - note the double space)Record the pattern and line numbers - you'll use these exact boundaries for extraction.
CRITICAL: Verify chapter sequence
After detecting chapter markers, verify no chapters are missing:
# List all detected chapters in order
grep -n "^##\s\+CHAPTER\|^##\s\+[0-9]" {path-to-book-file}.md
# Check the sequence for gaps (e.g., 1,2,3,8,9,10 - missing 4-7)
Production note: In today's conversion, we successfully detected:
If the list looks incomplete or has gaps:
Check the table of contents to see which chapters should exist, then search for missing chapter numbers in the file.
See: references/ocr-gap-detection.md for gap detection techniques.
🚨 CRITICAL REQUIREMENT:
SKILL.md files MUST ONLY reference chapter files in references/chapters/, NEVER the full Books/*.md files.
The Books/ files are too large to parse efficiently. All user-facing documentation in SKILL.md must point to chapter files only.
Chunking Requirements:
Validation:
# After splitting, check chunk sizes (both lines and tokens)
wc -l references/chapters/*.md
# Check token counts (character count / 4 = approximate tokens)
for f in references/chapters/*.md; do
chars=$(wc -c < "$f")
tokens=$((chars / 4))
echo "$f: $tokens tokens (approx)"
done
# Check for violations:
# - Any file >25,000 tokens (~100,000 characters)? → Need semantic splitting
# - Total count <5? → Need finer granularity (H2 sections)
Step 4.5: Split Oversized Chapters (if needed)
If any chapter exceeds 25,000 tokens (~100,000 characters):
chars=$(wc -c < chapter-NN.md); echo "$((chars / 4)) tokens (approx)"grep -n "^##[^#]" chapter-NN.mdchapter-NN-part1.md and chapter-NN-part2.mdProduction example (Windows Part 2 Chapter 8 - 8,389 lines, ~33,556 tokens):
# Check token count first
chars=$(wc -c < chapter-08.md); echo "$((chars / 4)) tokens (approx)"
# Found semantic split at line 4300 (## Executive objects - major topic shift)
sed -n '1,4299p' chapter-08.md > chapter-08-part1.md # 4,299 lines, ~17,196 tokens ✓
sed -n '4300,8389p' chapter-08.md > chapter-08-part2.md # 4,090 lines, ~16,360 tokens ✓
rm chapter-08.md # Remove unsplit version
# Verify both parts are under 25,000 token limit
for f in chapter-08-part*.md; do
chars=$(wc -c < "$f")
echo "$f: $((chars / 4)) tokens (approx)"
done
Production example (Chapter 10 - 5,687 lines, ~22,748 tokens):
# Check if splitting needed
chars=$(wc -c < chapter-10.md); echo "$((chars / 4)) tokens (approx)"
# Split at line 2668 (## Windows Management Instrumentation)
sed -n '1,2668p' chapter-10.md > chapter-10-part1.md # 2,668 lines, ~10,672 tokens ✓
sed -n '2669,5687p' chapter-10.md > chapter-10-part2.md # 3,019 lines, ~12,076 tokens ✓
rm chapter-10.md
Result: Oversized chapters (>25,000 tokens) split successfully, all parts now under token limit.
Manual sed extraction (recommended - precise control):
# Create chapters directory
mkdir -p .claude/skill-library/{category}/{skill-name}/references/chapters
# Extract each chapter using line boundaries from Step 3
# Format: sed -n '{start},{end}p' {source} > {output}
# Example from today's conversion (Windows Part 2):
sed -n '534,8922p' {path-to-book-file}.md > .claude/skill-library/{path}/references/chapters/chapter-08.md
sed -n '8923,12692p' {path-to-book-file}.md > .claude/skill-library/{path}/references/chapters/chapter-09.md
# ... repeat for each chapter
# Verify extraction
wc -l .claude/skill-library/{path}/references/chapters/*.md
Production evidence from today:
Result: All chapters extracted successfully with exact content boundaries.
Expected output:
chapter-01.md through chapter-NN.md - Main chapters (numbered to match book)See: references/chapter-splitting-techniques.md for handling complex book structures.
🚨 CRITICAL REQUIREMENT:
SKILL.md files MUST ONLY reference chapter files in references/chapters/, NEVER the source book file.
The source markdown files are too large to parse efficiently. All user-facing documentation in SKILL.md must point to chapter files only.
Required template:
---