Extract every relevant line value from all user-provided tax documents (W-2s, 1099s, 1098s, 1099-K, etc.) into a structured CSV summary. Runs once at the start of tax filing to build a complete inventory of document values. Triggers on: 'extract my tax docs', 'prepare my tax files', 'let's start my taxes', 'create a tax summary', 'read my W-2s', 'scan my tax documents', 'what do my tax forms say', 'I have my tax docs ready', 'start tax prep', 'pull values from my forms'. Use this skill whenever the user provides tax documents and wants values extracted, or at the beginning of any tax filing workflow — even if they just say something like 'I got my W-2' or 'here are my tax docs'.
Document extraction and inventory — the data intake phase that runs before /tax-cheatsheet. Reads every tax document the user provides, extracts each box/line value, and produces a single CSV at analysis/tax-doc-summary.csv that downstream skills reference.
Every conversation starts here:
analysis/tax-doc-summary.csv already exists/tax-cheatsheet?"When extracting, consult this table to know which boxes matter for each form. Extract every populated box — the table below highlights the most important ones and their downstream use.
| Form | Key Boxes | Description | Downstream Use |
|---|---|---|---|
| W-2 | 1, 2, 3, 4, 5, 6, 12a–12d, 14, 15–20 | Wages, withholding, Medicare, state/local | 1040 Lines 1a, 25a; Schedule A (SALT); Form 8959 |
| 1099-INT | 1, 3, 8, 13 | Interest income (taxable, savings bond, tax-exempt, FATCA) | 1040 Lines 2a, 2b; Schedule B |
| 1099-DIV | 1a, 1b, 2a, 2b, 5, 7, 12, 13 | Dividends, capital gain distributions, exempt interest | 1040 Lines 3a, 3b; Schedule B; Schedule D |
| 1099-B | 1d, 1e, 1f, 1g | Proceeds, cost basis, accrued market discount, wash sale adj | Schedule D; Form 8949 |
| 1099-K | 1a, 5a–5l | Gross payment amount, monthly breakdown | Schedule C Line 1 |
| 1098 | 1, 2, 4, 5, 6 | Mortgage interest, points, PMI, property tax, acquisition debt | Schedule A Lines 8a–10 |
| 1098-E | 1 | Student loan interest paid | Schedule 1 Line 21 |
| 1098-T | 1, 2, 5 | Tuition, amounts billed, scholarships | Education credits |
| 1099-R | 1, 2a, 2b, 4, 7 | Retirement distributions, taxable amount, distribution code | 1040 Lines 4a–5b |
| 1099-G | 1, 5 | State tax refund, unemployment compensation | Schedule 1 Lines 1, 7 |
| 1099-NEC | 1, 4 | Non-employee compensation, federal tax withheld | Schedule C; 1040 Line 25a |
| 1099-SA | 1, 2, 3 | HSA/MSA distributions, earnings, gross distribution | Form 8889 |
| 1095-C | 14–16 | Health coverage verification | ACA compliance (reference only) |
| W-2G | 1, 4, 7 | Gambling winnings, withholding, state winnings | Schedule 1; 1040 Line 25a |
For 1099-B statements with many transactions (e.g., dozens of stock trades), extract aggregate totals per category (short-term covered, long-term covered, etc.) rather than individual transactions. Record one summary row per category with total proceeds, total basis, and total gain/loss.
For each document the user provides:
12345.67). For text fields (distribution codes, checkboxes), record the text value.analysis/tax-doc-summary.csv using the format below. Use RFC 4180 quoting for descriptions that contain commas.FormType (Name - Payer) e.g., W-2 (Jane Doe - Acme Corp) or 1099-INT (First National Bank). If the name is redacted, use the payer/institution name alone.validate_extraction.py (see Validation below)If a document cannot be read (low-quality scan, password-protected PDF, unsupported format):
Output file: analysis/tax-doc-summary.csv
| Column | Description | Example |
|---|---|---|
document | Form type with name/payer | W-2 (Jane Doe - Acme Corp) |
box_or_line | Box or line number | Box 1 |
description | Human-readable label | Wages, tips, other compensation |
value | Dollar amount or text value | 125000.00 |
source_path | Relative path to source file | my-tax-docs/w2-acme-2025.pdf |
See templates/tax-doc-summary-template.csv for a complete example with sample data.
After extraction is complete, run the validation script to flag anomalies:
python .claude/skills/tax-prep/scripts/validate_extraction.py '{"csv_path": "analysis/tax-doc-summary.csv"}'
The script checks for common issues: zero withholding, Medicare wages below regular wages, missing cost basis on 1099-B, duplicate entries, and more. It outputs JSON with:
validation_results[] — each check with pass/warning/fail status and detailsummary — counts of checks passed, warnings, and failuresdocument_inventory[] — per-document count of extracted boxesInterpret the results:
Full rule definitions are in CLAUDE.md. Skill-specific subset below.
scripts/. Never perform arithmetic in natural language.reference/curated/. Format: (Source: filename.md, section). If a curated reference file does not exist yet, say: "I cannot verify this from provided IRS materials — check IRS.gov before relying on this."output/ must leave SSN, bank routing, account number, and signature fields BLANK.[REDACTED] and note it in the description.Scripts live in .claude/skills/tax-prep/scripts/. Invoke via:
python .claude/skills/tax-prep/scripts/<script>.py '<json_input>'
Scripts accept a single CLI argument (JSON string) and print a JSON object to stdout. Parse that output — those are the authoritative results.
| Script | Purpose |
|---|---|
validate_extraction.py | Validate extracted CSV for anomalies and completeness |
/tax-cheatsheet — line-by-line form guidance using the extracted values (run after this skill)/tax-audit — cross-check completed return against extracted source values/tax-advisor — next-year tax planning based on this year's data