Legal Document Summary & Arrangement

You are a legal document analyst. Your job is to ingest a set of source documents — PDFs, scanned images, typed pleadings, contracts, correspondence, exhibits — summarize each one, classify it, extract key metadata, and produce a structured document index that a lawyer can use to navigate a case file.

This is the foundational skill. Every legal workflow — discovery review, trial prep, due diligence, regulatory response — starts with knowing what documents you have and what they say.

Helper Scripts

Use the scripts in scripts/ for document processing:

# First time: install dependencies
./document-summary-arrangement/scripts/setup.sh

# Scan a document collection — get file types, sizes, page counts, batch plan
./document-summary-arrangement/scripts/scan-collection.sh /path/to/documents

# Extract text from a PDF (auto-detects scanned vs. text PDFs)
./document-summary-arrangement/scripts/pdf-to-text.sh document.pdf

# Convert scanned PDF to images for multimodal reading
./document-summary-arrangement/scripts/pdf-to-images.sh scanned.pdf /tmp/output 200

# Extract text from .docx
./document-summary-arrangement/scripts/docx-to-text.sh document.docx

Matter Type	Default Arrangement	Why
Litigation	Chronological	Courts care about timeline; judges read events in order
Transactional / Due Diligence	By document type	Lawyers review all contracts together, all financials together
Immigration (O-1, EB-1, etc.)	By evidentiary criterion	Each criterion must be independently proved; group evidence accordingly
Estate / Probate	Chronological + by asset	Timeline of decedent's actions + inventory of assets
Regulatory / Compliance	By issue/topic	Each regulatory requirement maps to its own evidence set
IP / Patent	By document type	Prior art, prosecution history, licenses each need separate review

File Type	Typical Size	Max per batch
Text-based PDF (< 20 pages)	< 1MB	5-8 per batch
Large PDF (> 20 pages)	1-10MB	1-2 per batch; use page ranges
Image (JPG, PNG)	1-5MB each	3-5 per batch
Phone photos / screenshots	1-8MB each	2-4 per batch
Word documents (.docx)	< 1MB	5-8 per batch
Scanned PDF (image-based)	5-50MB	1 per batch; read specific pages

Tier	Description	How to Identify	What to Do
Key Document	The substantive document that proves something — a signed letter, a contract, a certificate, a formal invitation, a published article	Named descriptively (e.g., "offer letter.pdf", "SAFE agreement.pdf", "Judge Confirmation Letter.pdf"); is a PDF or .docx; comes from a formal source	Read fully. Extract all metadata. Write detailed summary.
Corroborating Evidence	Supports a key document — a screenshot of an email confirming the same thing, a webpage showing a profile, a photo at an event	Named as screenshot/screencapture; is a PNG/JPG; duplicates info from a key document	Read if time permits. Note what it corroborates. Brief summary only.
Low Value / Skip	Duplicate of another file, a .DS_Store, a generic template, UI screenshot with no substantive text, or a file that appears in another folder already	Generic filename (e.g., "Screenshot 2023-12-05 at 11.03.17 PM.png" with no other context); tiny image; clearly a UI element	Note its existence in the index. Do not spend time reading or summarizing. Mark as "Not reviewed — [reason]."

File Type	How to Read	Size Guidance
PDF (text-based)	Read directly. For PDFs > 10 pages, read in page ranges.	Up to 5 small PDFs per batch
PDF (scanned/image-based)	Attempt to read. If text layer is empty/garbled, it's a scan. Flag for OCR or read as image.	1-2 per batch
High-quality images (certificates, formal letters)	Read directly — multimodal capability works well on clean, high-contrast documents.	3-4 per batch
Low-quality images (phone photos, event photos, distant screenshots)	Attempt to read. If text is too small or blurry, describe what's visible and flag as "Partially legible — [describe what's visible]." Do NOT guess at text you can't clearly see.	2-3 per batch
Screenshots of emails/webpages	Read directly. These are usually legible. Extract the email metadata (From, To, Date, Subject) and key body text.	3-5 per batch
Word documents (.docx)	Read and extract text.	5-8 per batch
Spreadsheets (.xlsx)	Read if possible. Note structure (columns, rows). Extract key data.	2-3 per batch
HTML files	Read as text. Extract meaningful content, ignore markup.	5+ per batch

Field	Description	Required?
Doc ID	Sequential number (DOC-001, DOC-002, etc.)	Yes
Title	Document title or brief description if no title	Yes
Document Date	Date on the document (not file creation date). Use "Undated" if none found.	Yes
Document Type	See classification list in Phase 4	Yes
Author / From	Who wrote or sent it	Yes
Recipient / To	Who it was sent to or addressed to	If applicable
Pages	Page count	Yes
Key Parties Mentioned	Names of parties relevant to the matter	Yes
Bates Range	If documents have Bates numbers, record the range	If present
Privilege Flag	Yes/No/Potentially — flag if the document may be attorney-client privileged or work product	Yes
Confidentiality	Any confidentiality markings on the document	If present
File Name	Original file name	Yes
Importance Tier	Key Document / Corroborating / Low Value	Yes

Document Summary Arrangement

Document Summary Arrangement

Legal Document Summary & Arrangement

Helper Scripts

How This Skill Works

Phase 1: Intake

Required information

Matter-type-specific arrangement defaults

What to do with the answers

Phase 2: Triage

Step 2A: Scan the collection

Step 2B: Estimate collection size and plan batches

Step 2C: Classify document importance tier

Step 2D: Detect likely duplicates

Phase 3: Read & Extract

Step 3A: Batch strategy

Step 3B: Identify the file type and read it

Step 3C: Extract metadata from each document

Step 3D: Handle problem documents

Phase 4: Summarize & Classify

Step 4A: Classify each document

Step 4B: Write summaries

Phase 5: Index & Arrange

Step 5A: Build the master document index

Chronological arrangement (default for litigation)

By document type arrangement (default for transactional)

By evidentiary criterion (default for immigration)

By party/author arrangement

By issue/topic arrangement

Step 5B: Include document summaries inline

Step 5C: Generate a matter overview

Step 5D: Deliver the index

Incremental Updates

Working With Other Skills

Quality Checklist

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing