Name: Tabular Document Review
Author: borghei

Tabular Document Review | Skills Pool

python scripts/document_discovery.py /path/to/contracts

python scripts/document_discovery.py /path/to/ndas --types pdf,docx --json

python scripts/document_discovery.py /path/to/leases --types pdf,docx,txt,md --min-size 1024

python scripts/extraction_aggregator.py \
  --results extraction_1.json extraction_2.json extraction_3.json

python scripts/extraction_aggregator.py \
  --results-dir ./extraction_results/ --json

python scripts/extraction_aggregator.py \
  --results-dir ./extraction_results/ \
  --format markdown \
  --output review_matrix.md

python scripts/extraction_aggregator.py \
  --results extraction_1.json extraction_2.json \
  --columns "Parties,Effective Date,Term,Governing Law"

Reference	Purpose
`references/extraction_methodology.md`	Document extraction best practices, JSON schema, agent prompts
`references/common_extraction_columns.md`	Pre-defined column sets for contracts, NDAs, employment, leases

Step	Action	Tool	Output
1. Gather Requirements	Define document folder, output filename, columns to extract	Manual	Column list, file path
2. Discover Documents	Scan directory for target documents	`document_discovery.py`	Document manifest JSON
3. Process Documents	Extract values per column with citations (parallel agents)	AI agents (external)	Per-document extraction JSONs
4. Collect Results	Aggregate extraction JSONs into unified matrix	`extraction_aggregator.py`	Consolidated matrix
5. Generate Output	Export as markdown table or structured JSON	`extraction_aggregator.py`	Final deliverable

You are reviewing {count} legal documents. For each document, extract the
following columns:

{column_definitions}

For each value extracted:
1. Provide the exact value found
2. Include the page number (PDF) or section/paragraph (DOCX/MD)
3. Rate your confidence: HIGH (exact match), MEDIUM (inferred), LOW (uncertain)
4. If not found, record "NOT FOUND" with confidence LOW

Output as JSON per the extraction schema.

Level	Color Code	Definition
HIGH	Green	Exact value found with clear citation
MEDIUM	Yellow	Value inferred from context; multiple possible interpretations
LOW	Red / Not Found	Value uncertain or not found in document

Document	Parties	Effective Date	Term	Governing Law	...
contract_a.pdf	Acme / Beta [p.1]	2026-01-15 [p.2]	3 years [p.3]	Delaware [p.12]	...
contract_b.pdf	Gamma / Delta [p.1]	NOT FOUND	2 years [p.4]	New York [p.10]	...

Column	What to Extract
Parties	All contracting parties with full legal names
Effective Date	Contract effective or execution date
Term	Duration of the agreement
Renewal	Auto-renewal terms and notice period
Governing Law	Jurisdiction governing the agreement
Liability Cap	Maximum liability amount or formula
Indemnification	Indemnification obligations and scope
IP Ownership	Intellectual property ownership provisions
Termination Rights	Termination triggers and notice requirements
Data Protection	Data protection or privacy obligations

Column	What to Extract
Parties	Disclosing and receiving parties
Type	Mutual or one-way
Definition Scope	How "confidential information" is defined
Exceptions	Standard exceptions to confidentiality
Term	Duration of confidentiality obligations
Survival	Survival period after termination
Return/Destruction	Obligations on termination
Remedies	Available remedies for breach

Problem	Cause	Solution
Discovery finds 0 documents	Wrong path or file types	Verify path exists; check `--types` matches actual file extensions
Extraction JSONs have wrong schema	Agent prompt incomplete	Use the extraction schema from `extraction_methodology.md`
Aggregator shows conflicts	Multiple values for same cell	Review source documents; aggregator marks conflicts for manual review
High "NOT FOUND" rate	Columns too specific for document type	Use column definitions from `common_extraction_columns.md`; broaden definitions
Confidence all LOW	Agent unable to locate values	Check column definitions are specific enough; verify document is readable
Aggregator crashes on large set	Too many result files loaded at once	Process in batches of 50 results; use `--columns` to limit output width
Markdown table misaligned	Long values or special characters	Use `--format json` for machine processing; truncate long values
Missing citations	Agent did not include page/section references	Reinforce citation requirement in agent prompt; check extraction schema

Anti-Pattern	Why It Fails	Better Approach
Vague column definitions	"Date" could match dozens of dates in a contract	Use specific definitions: "Effective Date" with guidance on where to look
Skipping document discovery	Unknown document count leads to wrong agent allocation	Always run discovery first; use manifest for pipeline planning
Ignoring LOW confidence results	Missing or uncertain data treated as fact	Review all LOW confidence cells manually; flag in final report
Processing 100+ docs with 1 agent	Slow, context window overflow, quality degradation	Use parallel processing: ceil(N/10) documents per agent, max 10 agents
No citation requirement	Cannot verify extracted values against source	Require page/section citation for every extraction; reject uncited values

Agents	Documents per Agent	Use When
1	All	1-5 documents
2-3	ceil(N/agents)	6-15 documents
4-6	ceil(N/agents)	16-40 documents
7-10	ceil(N/agents)	41-100 documents
10 (max)	ceil(N/10)	100+ documents

Metric	Value
Documents processed	25
Columns extracted	8
Average confidence	87%
Not found rate	12%

Tabular Document Review

Tabular Document Review Skill

Overview

Table of Contents

Tabular Document Review

Tabular Document Review Skill

Overview

Table of Contents

Tools

1. Document Discovery (`scripts/document_discovery.py`)

2. Extraction Aggregator (`scripts/extraction_aggregator.py`)

Reference Guides

Workflows

5-Step Document Review Pipeline

Parallel Processing Strategy

Agent Prompt Template

Confidence Scoring

Output Format

Extraction Scenarios

Contract Review

NDA Review

Troubleshooting

Success Criteria

Scope & Limitations

Anti-Patterns

Tool Reference

`scripts/document_discovery.py`

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing

Tabular Document Review

Tabular Document Review Skill

Overview

Table of Contents

Tabular Document Review

Tabular Document Review Skill

Overview

Table of Contents

Tools

1. Document Discovery (scripts/document_discovery.py)

2. Extraction Aggregator (scripts/extraction_aggregator.py)

Reference Guides

Workflows

5-Step Document Review Pipeline

Parallel Processing Strategy

Agent Prompt Template

Confidence Scoring

Output Format

Extraction Scenarios

Contract Review

NDA Review

Troubleshooting

Success Criteria

Scope & Limitations

Anti-Patterns

Tool Reference

scripts/document_discovery.py

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing

1. Document Discovery (`scripts/document_discovery.py`)

2. Extraction Aggregator (`scripts/extraction_aggregator.py`)

`scripts/document_discovery.py`