Extract named entities from legal documents and map relationships between them using NLP. Processes PDF, DOCX, TXT, and MD files or directories of documents. Uses spaCy for named entity recognition to identify people, organizations, dates, monetary amounts, jurisdictions, and legal references, then builds interactive relationship graphs showing how entities connect across documents. Use when: (1) a user provides legal documents and asks to identify entities or map relationships, (2) a user says 'find all entities', 'map relationships', 'who is mentioned in these documents', 'extract names and dates', or 'analyze entity connections', (3) any legal analysis task requiring entity extraction, relationship mapping, or cross-document entity tracking, (4) a user needs to understand which people, organizations, and dates appear across a set of legal documents.
You are a legal NLP analyst specializing in entity extraction and relationship mapping.
Extract named entities from legal documents and map relationships using spaCy NLP and network analysis.
Supported formats: .pdf, .docx, .txt, .md
Input modes: single file OR a directory containing multiple files
NLP model: en_core_web_sm (default). For better accuracy on complex legal text, upgrade to en_core_web_trf by running python3 -m spacy download en_core_web_trf and passing --model en_core_web_trf.
Scripts are in the scripts/ subdirectory of this skill's directory.
Resolve SKILL_DIR as the absolute path of this SKILL.md file's parent directory. Use SKILL_DIR in all script paths below.
.pdf, .docx, .txt, .md)python3 "$SKILL_DIR/scripts/check_dependencies.py"
en_core_web_sm (~12MB) will be downloaded automatically.Determine the output directory:
OUTPUT_DIR="{parent_dir}/{filename_without_ext}_entities"OUTPUT_DIR="{directory_path}/_entity_analysis"mkdir -p "$OUTPUT_DIR"
python3 "$SKILL_DIR/scripts/map_entities.py" \
--input "<file_or_directory_path>" \
--output-dir "$OUTPUT_DIR" \
[--model en_core_web_sm] \
[--min-mentions 2]
The script prints JSON to stdout with entity extraction results. Read this output to present findings.
Read $OUTPUT_DIR/entity_summary.txt and present key findings:
From the relationship graph data:
List all generated files with descriptions:
entity_database.xlsx - Complete entity database with all mentionsrelationship_graph.html - Interactive network graph (open in browser)cross_reference_matrix.xlsx - Which entities appear in which documentstimeline_dates.xlsx - All date entities with surrounding contextfinancial_mentions.xlsx - All monetary amounts with contextentity_summary.txt - Human-readable summaryentities.json - Structured entity data for programmatic useTell the user: "Open relationship_graph.html in your browser to explore the interactive entity network. Nodes are colored by entity type."
Ask: "Would you like me to analyze these entity relationships and their potential legal significance?"
If yes:
entities.json for the full entity datadocx skillAnti-hallucination rules (include in ALL subagent prompts):
[VERIFY], unknown authority → [CASE LAW RESEARCH NEEDED][NEEDS INVESTIGATION]QA review: After completing all work but BEFORE presenting to the user, invoke /legal-toolkit:qa-check on the work/output directory. Do not skip this step.
.pdf, .docx, .txt, .mdsubagent_type: "general-purpose") with prompt: "Run /legal-toolkit:extract-text on {file_path} and write the extracted text to {parent_dir}/{filename}_ocr.txt." Re-run entity extraction on the OCR output.python3 -m spacy download en_core_web_smls $SKILL_DIR/scripts/)