Content analysis for documents — PDFs, gists, articles, markdown, arxiv papers, code snippets, meeting notes. Dual-lens (Creator View + Engineer View). Two user-invokable depths (Standard / Deep); Quick Scan is triage- only. Outputs to .research/analysis/<doc-slug>/.
Document Version: 2.0 Last Updated: 2026-04-15 Status: ACTIVE
<!-- prettier-ignore-end -->Shared conventions: See .claude/skills/shared/CONVENTIONS.md
/analyze router: This skill is the document-handler arm of /analyze —
direct invocation and router dispatch both supported. Handoff contract: the
router passes {target, auto_detected_type: "document"} as if the skill were
invoked directly.
Analysis of documents as knowledge artifacts. Mirrors /repo-analysis structure
for cohesion with sibling CAS handlers. Handles any text-readable source: PDFs,
GitHub gists, articles, markdown files, arxiv papers, code snippets, meeting
notes.
/document-analysis <path-or-url> [--depth=standard|deep]
depth: standard (default) | deep | quick (triage)
phases: PHASE N of M (M = 9 Standard / Deep, 1 Quick)
est. time: Quick ~15s | Standard 2-4m | Deep 5-10m
output: .research/analysis/<slug>/
prior feedback: {replay per CONVENTIONS §18 if prior state file exists}
| You want to… | Use this |
|---|---|
| Analyze one PDF / gist / article / paper | /document-analysis (here) |
| Let router auto-pick repo vs site vs PDF | /analyze <target> |
| Cross-source synthesis across 3+ | /synthesize |
| GitHub repo | /repo-analysis |
| Website / blog | /website-analysis |
| Video / audio | /media-analysis |
.claude/state/document-analysis.<slug>.state.json after each phase..claude/skills/ listing, MEMORY.md user entries.scripts/lib/analysis-schema.js. See CONVENTIONS.md §12.pdftoppm not found, use pdfjs-dist via
node: require('pdfjs-dist/legacy/build/pdf.mjs') to extract text per page.
Install: npm install pdfjs-dist --no-save. Permanent fix:
winget install poppler./repo-analysis/website-analysis/media-analysis/webapp-testing/document-analysis <path-or-url>
/document-analysis <path> --depth=standard
/document-analysis <path> --depth=deep
/document-analysis <path> --depth=quick # triage only
Type detection (when called via /analyze router):
| Input Pattern | Detected As |
|---|---|
*.pdf | PDF document |
*.md, *.txt | Markdown/text |
gist.github.com/* | GitHub gist |
arxiv.org/* | Academic paper |
| Other non-repo, non-website URL | Article |
.research/analysis/<doc-slug>/ produces:
| Artifact | Phase | Format / Notes |
|---|---|---|
analysis.json | 0+ | Core record (schema v3.0) |
findings.jsonl | 3/5 | One JSON object per line |
creator-view.md | 4 | Conversational prose, 6 sections |
summary.md | 5 | Health bands |
value-map.json | 6 | Candidates ranked |
deep-read.md | 2 | Internal references catalog |
content-eval.jsonl | 3.5 | Evaluated references |
coverage-audit.jsonl | 6b | Unexplored items + user decisions |
extraction-journal.jsonl | routing | Append-only cross-source record |
Full schemas in REFERENCE.md §1.
VALIDATE Guards -> File exists? URL reachable? Supported type? Prior feedback replay (§18)?
PHASE 0 of 9 Quick Scan -> Read first page/section, classify, lightweight creator lens
GATE Interactive -> "Run Standard/Deep?" — flag bypasses
PHASE 1 of 9 Content Load -> Read full document
PHASE 2 of 9 Deep Read -> Internal references, citations, linked resources
PHASE 3 of 9 Dimension Wave -> 6 dimensions: depth, methodology, actionability, novelty, clarity, source quality
PHASE 3.5 of 9 Content Eval -> Evaluate embedded references — BEFORE Creator View
PHASE 4 of 9 Creator View -> 6 sections, home context comparison
PHASE 5 of 9 Engineer View -> Quality bands via shared scoring
PHASE 6 of 9 Value Map -> Pattern/knowledge/content/anti-pattern candidates
PHASE 6b of 9 Coverage Audit -> Unread sections, unfollowed references
PHASE 6c of 9 Tag Suggestion -> Per _shared/TAG_SUGGESTION.md
SELF-AUDIT + ROUTING
v2.0 phase renumber (BREAKING): Per Cat 2-E + Pattern 10 combined fix:
State files with old phase labels auto-migrate on resume through v2.2 transition window.
Phase markers: ========== PHASE N of M: [NAME] ==========
Read the first page (PDF) or first 100 lines (text/markdown). For URLs, fetch and read the first screenful. Classify the document type and produce a lightweight assessment.
Lightweight creator lens (MUST): Write 2-3 sentences — "This document appears to cover/argue/teach X." Enough to judge Standard/Deep worth.
Gate: Quick Scan is a preview — not a peer user tier. Standard produces
the full artifact set needed for /synthesize. All --depth flags bypass.
source_tier: Documents span T1-T3. Default T2; academic papers T1,
anonymous gists T3.
Done when: analysis.json (Quick tier) exists AND creator-lens teaser written.
pages parameter. >20 pages → batch. Note total.Scope-explosion soft prompt: PDF with >100 pages → prompt:
"Document is N pages. Analyze all / first 50 pages / table of contents only / custom?"
User decides; never hard-block.
Done when: full text loaded into working memory AND page/line count recorded in state.
Identify and catalog:
Output: deep-read.md listing what was found and what knowledge it
contains.
Done when: deep-read.md exists AND references cataloged for Phase 3.5.
Score the document across 6 dimensions (0-100, band system):
| Dimension | What It Measures |
|---|---|
| Content Depth | How thoroughly topics are covered |
| Methodology | Rigor of reasoning, evidence quality |
| Actionability | How directly applicable the ideas are |
| Novelty | Original insights vs rehashed knowledge |
| Clarity | Writing quality, organization, accessibility |
| Source Quality | Author credibility, citation quality, recency |
Done when: all 6 dimension scores written to findings.jsonl.
Phase renumbered from 4b to 3.5 in v2.0 — runs BEFORE Creator View and feeds into it.
Evaluate embedded references for relevance to home context. For each reference: category, name, URL, relevance (high/medium/low/none), applicability, home_connection.
Done when: content-eval.jsonl exists AND every reference has a relevance rating AND feeds Creator View §2.
Same 6 sections as repo-analysis. Written in conversational prose.
Load home context before writing. Reference specific content from Deep Read (Phase 2) and Content Eval (Phase 3.5).
Done when: creator-view.md exists AND all 6 sections written AND home-context claims reference actual files/projects.
Quality bands using shared scoring. 6 dimensions from Phase 3 produce summary bands. Absence pattern classification where applicable.
Done when: summary.md contains all 6 bands.
4 candidate types: pattern, knowledge, content, anti-pattern. Same ranking fields (novelty, effort, relevance).
Promotion rules (MUST):
Write value-map.json. Update .research/extraction-journal.jsonl.
Done when: value-map.json exists AND promotion rules applied.
Scan for unread sections, unfollowed references, unexplored citations.
Interactive prompt: Analyze all / Select / Skip. Record in
coverage-audit.jsonl.
Done when: every item has user_decision (analyze/skip) or
status: analyzed.
Follow the canonical protocol in
.claude/skills/_shared/TAG_SUGGESTION.md. Per
CONVENTIONS §14: at least 3 semantic tags per entry, 8 categories, no upper
bound.
Signal sources for document-analysis: creator-view.md, entry notes,
engineer-view.md, cited references.
Done when: user-approved tags written to analysis.json.tags AND each
extraction-journal.jsonl row.
| Gate | Default |
|---|---|
--depth unspecified | standard |
| Quick → Standard gate unanswered | proceed to Standard |
| Scope-explosion (>100 pages) | first 50 pages |
| Coverage Audit unanswered | skip all (logged) |
| Tag Suggestion unanswered | never auto-approve — block |
| Routing menu unanswered | 7. Done (cleanup + invocation track) |
| Prior Feedback Replay (CONV §18) | continue unchanged (logged as shown) |
Tag Suggestion auto-approve is forbidden (CONVENTIONS §14.6).
Full guard catalog — REFERENCE §9.
Per CONVENTIONS §8 plus domain checks:
analysis.json.tags non-empty (user-approved)user_decisionphase-2-deep-read before
phase-3-dimension-wave, phase-3.5-content-eval before
phase-4-creator-view, phase-6c-tags before self-auditprior_feedback_shown: true if prior state
existed (CONVENTIONS §18)Same 8 options as repo-analysis:
| Option | Action |
|---|---|
| 1. Extract value | Present candidates from value-map |
| 2. Send to TDMS | Transform findings to TDMS format |
| 3. Deep-plan this | Inject analysis as research context |
| 4. Save to memory | Persist key findings |
| 5. Adoption verdict | Full assessment (if applicable) |
| 6. Explore insights | Deeper conversation about findings |
| 7. Done | Cleanup, confirm artifacts, track invocation |
| 8. Cross-source synthesis | If 3+ sources analyzed, offer /synthesize |
State file: .claude/state/document-analysis.<slug>.state.json.
Update after every phase. On re-invocation with same input: offer Resume /
Re-run / Compare. State file stores process_feedback (nullable) from retro.
/repo-analysis, /website-analysis, /media-analysis/analyze (auto-detects document sources)/synthesize (cross-source, requires 3+ sources)/deep-plan (as research context), JASON-OSlast_synthesized_at field on
analysis.json when writing — this field is set by /synthesize Phase 5 and
must not be dropped by handler re-runs (v2.0 contract, Session #284)..research/extraction-journal.jsonl,
.research/EXTRACTIONS.md, .research/reading-chain.jsonlRetro (CONVENTIONS §10): Before the routing menu, ask: "What worked well?
What would you change next time?" Save to process_feedback in the state file.
Optional structured dimensions: worked_well, would_change, longest_phase,
signal_quality.
Prior Feedback Replay (CONVENTIONS §18): On re-invocation for the same
target, replay prior process_feedback during VALIDATE and ask whether to
adjust approach. Log prior_feedback_shown: true.
Invocation tracking — at Done routing:
cd scripts/reviews && npx tsx write-invocation.ts --data '{
"skill":"document-analysis","type":"skill","success":true,
"schema_version":1,"completeness":"stub",
"origin":{"type":"manual"},
"context":{"target":"DOC_SLUG","mode":"document","depth":"DEPTH",
"score":SCORE,"decisions":DECISION_COUNT,
"candidates":CANDIDATE_COUNT}
}'
| Version | Date | Description |
|---|---|---|
| 2.0 | 2026-04-15 | Skill-audit batch 2026-04-15-analysis-quartet Wave 2. Breaking: phase renumber — Phase 2 (Dimension Wave) → Phase 3, Phase 2b (Deep Read) → Phase 2, Phase 4b (Content Eval) → Phase 3.5 (Cat 2-E + Pattern 10 combined). Structural: /analyze router ack, Warm-up, Routing Guide, NEW Integration section, NEW Retro section, NEW invocation tracking, Delegation & Defaults, consolidated Guard Rails top-5, PDF scope-explosion soft prompt (>100 pages), Done-when gates, PHASE N of M, Tag Suggestion → _shared ref, Prior Feedback Replay per CONVENTIONS §18, output table. T28 tagline removed from user-visible description (preserved in v1.0 history below). |
| 1.1 | 2026-04-09 | PDF fallback for Windows, add summary.md + deep-read.md + content-eval.jsonl + coverage-audit.jsonl to output list (Session #270 E2E test) |
| 1.0 | 2026-04-08 | Initial creation (T28 CAS, Session #269) |