Summarize files by reading content, extracting key passages, and applying type-specific strategies. Activates on summarize this file, what's in this file, describe this codebase, file summary, analyze this file, tl;dr this file, what does this code do, explain this config, break down this script. Routes to strategies for code, config, data, documentation, markup, and binary files based on extension and word count.
Apply this methodology when summarizing files of any type. This skill provides the routing logic and type-specific strategies for faithful file summarization.
Before summarizing any file, the model MUST:
Read the file - Use the Read tool to access the actual content. Never guess from the filename.
Assess size - Run $CLAUDE_PLUGIN_ROOT/scripts/file_metrics.py to determine word count and file type. If the script is unavailable, use the Read tool and manually estimate word count from line count.
Select strategy - Based on size thresholds from the table below.
Verify file type - Use file extension and content inspection to determine which type-specific strategy to apply.
| File Size | Strategy | Approach |
|---|
| Small (< 2,000 words) | Full read with extractive summarization | Read entire file, extract key passages, summarize from extracts |
| Medium (2,000-10,000 words) | Section-based extraction | Read full file, identify sections/modules, extract from each section, synthesize |
| Large (> 10,000 words) | Chunk and map-reduce | Split into chunks, summarize each chunk, synthesize chunk summaries |
SOURCE: Size thresholds adapted from Anthropic knowledge-synthesis skill (knowledge-work-plugins repository, accessed 2026-02-06). Strategy patterns informed by Map-Reduce Summarization methodology.
File extensions: .py, .js, .ts, .jsx, .tsx, .rs, .go, .java, .c, .cpp, .h, .rb, .php, .swift, .kt, .scala, .sh, .bash, .zsh
The model MUST extract:
main(), CLI argument parsing, exported functionsExtraction method: Read sequentially. Capture top-level definitions with their line numbers. Extract docstrings verbatim. Quote complex logic rather than paraphrasing.
Example summary structure:
## Summary
Python module for HTTP client authentication. Implements JWT token refresh flow with retry logic. Exports `AuthClient` class and `refresh_token()` function.
## What Was Found
- Class `AuthClient` (lines 15-87): JWT-based HTTP client with automatic token refresh
- Function `refresh_token()` (lines 92-105): Retries up to 3 times on 401 errors
- Dependencies: `httpx`, `jwt`, `tenacity` (lines 1-3)
- Environment variables: `AUTH_BASE_URL`, `AUTH_CLIENT_ID` (lines 10-11)
## What Was NOT Found
- No test coverage information in this file
- No error handling for network failures
- Configuration schema not documented
File extensions: .json, .yaml, .yml, .toml, .ini, .env, .conf, .cfg, .properties
The model MUST extract:
Extraction method: Parse structure. For small files, include all keys. For large files, sample representative sections and note structure patterns.
Example summary structure:
## Summary
Application configuration in YAML format. Defines database connection, API endpoints, feature flags, and logging settings. 47 configuration keys across 5 top-level sections.
## What Was Found
- `database.host`, `database.port`, `database.name` (lines 2-4): PostgreSQL connection settings
- `api.base_url`, `api.timeout` (lines 7-8): External API configuration
- `features.experimental_mode: false` (line 12): Feature flag for beta features
- `logging.level: INFO`, `logging.format` (lines 15-16): Logging configuration
## What Was NOT Found
- No schema validation rules present
- No environment-specific overrides documented
- API authentication credentials not in this file
File extensions: .csv, .tsv, .parquet, .json (when data-structured), .jsonl, .ndjson
The model MUST extract:
Extraction method: For CSV/TSV, read header row and first 10 data rows. For Parquet, note that binary inspection is limited. For JSON, inspect array structure.
Example summary structure:
## Summary
CSV file containing user activity logs. 1,247 rows with 8 columns. Timestamps range from 2025-01-01 to 2026-02-06. No missing values detected.
## What Was Found
- Column `user_id` (integer): User identifiers, range 1001-5432
- Column `timestamp` (ISO 8601): Activity timestamps
- Column `action` (string): Values include "login", "logout", "view_page", "click_button"
- Column `duration_ms` (integer): Range 0-45000
- 1,247 total records (line count: 1,248 including header)
## What Was NOT Found
- No schema documentation in file
- Column `referrer` is present but not documented
- No indication of data collection methodology
File extensions: .md, .rst, .txt, .adoc, .org
The model MUST extract:
Extraction method: Read sequentially. Extract headings to build table of contents. Quote key passages that define core concepts. Note code examples.
Example summary structure:
## Summary
User guide for deploying containerized applications. Covers Docker setup, image building, registry configuration, and troubleshooting. 5 main sections with 23 subsections. Includes 12 shell command examples.
## What Was Found
- Section "Getting Started" (lines 10-45): Docker installation on Linux and macOS
- Section "Building Images" (lines 47-89): Dockerfile syntax and multi-stage builds
- Section "Troubleshooting" (lines 200-245): Common errors with solutions
- 12 shell command examples throughout document
## What Was NOT Found
- No Windows deployment instructions
- Security best practices not covered
- Performance tuning section mentioned but not written (line 15: "TODO")
File extensions: .pdf, .zip, .tar, .gz, .bin, .exe, .so, .dylib, .dll, or unrecognized extensions
The model MUST:
Attempt to read - Use the Read tool. If the tool returns binary content or an error, note this.
State limitation - Do NOT guess contents. State: "Binary file, cannot extract text content."
Provide file metadata - File size, extension, location.
For PDFs: Use the Read tool with pages parameter to extract text from specific page ranges. Summarize text content if extraction succeeds.
Example for unreadable binary:
## Summary
Binary file, cannot extract text content.
## What Was Found
- File path: ./build/output.bin
- File size: 2.3 MB
- Extension: .bin
## What Was NOT Found
Unable to determine contents without binary inspection tools.
## Uncertain
File may be compiled binary, compressed archive, or proprietary format.
For all text-based files, the model MUST apply the quote-grounding technique:
SOURCE: Technique adapted from Fidelity Rules Rule 2 (lines 27-41).
All file summaries MUST use the structured output format defined in Structured Summary.
Required sections:
source_type: file, source_path, method, confidence, word countsThe model MUST follow all fidelity rules defined in Fidelity Rules.
Critical rules for file summarization:
When the user requests summarization of multiple files:
SOURCE: Multi-source synthesis approach from Summarizer lines 33-37.
If a file cannot be read:
../summarizer/templates/{format_id}.md (default: structured). The template defines the schema, required sections, and fidelity constraints for the selected format.The model MUST NOT: