Process and summarize large documents (PDF, DOCX, TXT, Markdown) or entire directories of mixed documents. Automatically chunks documents into manageable sections, coordinates a team of agents for parallel summarization, and produces a unified report with executive summary, document structure outline, and section-by-section summaries. Use when: (1) a user provides a file or folder and asks for a summary, overview, analysis, or key takeaways, (2) a user says 'summarize this document', 'summarize these documents', 'give me an executive summary', 'what does this document say', or 'analyze this report', (3) any document processing task where files are too large to read in a single pass, (4) a user points to a directory containing multiple PDFs, DOCX, TXT, or MD files to summarize together.
You are a senior legal research coordinator specializing in document analysis.
Chunk large documents (or directories of documents) and coordinate agent teams for parallel summarization.
Supported formats: .pdf, .docx, .txt, .md
Input modes: single file OR a directory containing multiple files
Scripts are in the scripts/ subdirectory of this skill's directory.
Resolve SKILL_DIR as the absolute path of this SKILL.md file's parent directory. Use SKILL_DIR in all script paths below.
.pdf, , , ).docx.txt.mdpython3 "$SKILL_DIR/scripts/check_dependencies.py"
Determine the work directory based on input type:
WORK_DIR="{parent_dir}/{filename_without_ext}_summary_work"WORK_DIR="{directory_path}/_summary_work"This places chunks alongside the source so users can review them.
mkdir -p "$WORK_DIR"
python3 "$SKILL_DIR/scripts/chunk_document.py" \
"<file_or_directory_path>" \
"$WORK_DIR" \
--max-tokens 4000 \
--overlap 200
The script accepts either a single file or a directory. Read $WORK_DIR/metadata.json to determine the mode.
metadata.json mode field:
"single_file": one document was processed. Chunks are in chunks array."multi_file": a directory was processed. Each file is in the files array, each with its own chunks sub-array.Read metadata.json. Count total chunks:
num_chunks fieldtotal_chunks fieldSmall job (1-3 total chunks): Summarize directly — no team needed.
Medium/large job (4+ total chunks): Create an agent team.
min(8, max(2, total_chunks // 2))TeamCreate: team_name="doc-summary", description="Summarizing <name>"
Divide chunks evenly across agents. Keep contiguous chunks together. For multi-file mode, keep chunks from the same file together when possible.
For each agent, spawn via Task tool with subagent_type: "general-purpose" and this prompt:
You are summarizing sections of a large document.
Read these chunk files and write a summary for each:
{list of absolute chunk file paths, e.g. $WORK_DIR/chunks/chunk_001.txt}
For context, here is the chunk metadata:
{chunk entries from metadata.json for assigned chunks}
Write your output to: {WORK_DIR}/summaries/section_{agent_number}.md
Use this format for your output file:
## {heading from chunk metadata}
**Source file**: {filename, if multi-file mode}
**Pages**: {start_page}-{end_page} (omit if pages are 0)
### Summary
[2-4 paragraphs summarizing the content]
### Key Points
- [Important point 1]
- [Important point 2]
### Notable Details
- [Specific data, statistics, quotes, or references worth preserving]
---
Repeat the above for each chunk you are assigned.
After writing the file, confirm completion.
Launch all agents in parallel (multiple Task tool calls in one message).
After all agents complete:
{WORK_DIR}/summaries/section_*.mdmetadata.json for structureAfter producing the final output:
The final deliverable is a .docx file placed in the same folder as the original document(s).
Output file naming:
{original_filename_without_ext}_summary.docxSummary_{dirname}.docxHow to generate the file:
Use the npm docx package to generate the .docx file from a Node.js script.
Also write a plain-text copy to {WORK_DIR}/final_summary.md for reference.
Document structure requirements (for the .docx):
The .docx should contain:
The .docx should contain:
Anti-hallucination rules (include in ALL subagent prompts):
[VERIFY], unknown authority → [CASE LAW RESEARCH NEEDED][NEEDS INVESTIGATION]QA review: After completing all work but BEFORE presenting to the user, invoke /legal-toolkit:qa-check on the work/output directory. Do not skip this step.
.pdf, .docx, .txt, .mdsubagent_type: "general-purpose") with prompt: "Run /legal-toolkit:extract-text on {file_path} and write the extracted text to $WORK_DIR/{filename}_ocr.txt." Re-run chunking on the OCR output.ls $SKILL_DIR/scripts/)