Understand data structures in the GCACW parser pipeline (raw tables, parsed units, web JSON). Use when debugging data issues, understanding field mappings between stages, checking column configurations, or investigating why units are missing or malformed. Includes inspection utilities (inspect_raw.py, inspect_parsed.py, compare_data.py). Essential for work involving raw_table_extractor.py, parse_raw_tables.py, convert_to_web.py, or game_configs.json.
Documentation for the three-stage data pipeline used by the GCACW scenario parser.
PDF → raw_table_extractor.py → raw/{game}_raw_tables.json (snake_case)
↓
parse_raw_tables.py → parsed/{game}_parsed.json (snake_case)
↓
convert_to_web.py → web/public/data/{game}.json (camelCase)
Use this skill when:
game_configs.jsonFile: raw/{game}_raw_tables.json
Source: raw_table_extractor.py
Structure preserves PDF extraction exactly as read:
confederate_tables and union_tablesheader_row, rows[][], and annotations{}"2*")File: parsed/{game}_parsed.json
Source: parse_raw_tables.py
Structured unit objects with semantic fields:
unit_leader, size, command, unit_type, manpower_value, hex_locationnotes[] array"Confederate" or "Union"turn, reinforcement_set, table_nameFile: web/public/data/{game}.json
Source: convert_to_web.py
Frontend-ready format:
hexLocation, manpowerValue)confederateGunboats / unionGunboatsweb/src/types.tsThree utility scripts help debug data at each stage:
# View raw table data
cd parser && uv run python utils/inspect_raw.py <game_id>
cd parser && uv run python utils/inspect_raw.py <game_id> --scenario 1 --table "Confederate Set-Up"
# View parsed units
cd parser && uv run python utils/inspect_parsed.py <game_id>
cd parser && uv run python utils/inspect_parsed.py <game_id> --scenario 1 --side Confederate
# Compare raw vs parsed (for debugging parsing issues)
cd parser && uv run python utils/compare_data.py <game_id> <scenario>
cd parser && uv run python utils/compare_data.py <game_id> <scenario> --side Union --table "Union Set-Up"
All scripts support --help for detailed usage.
uv run python inspect_raw.py <game> --scenario Ngame_configs.jsonutils/diagnose_pdf.pyuv run python compare_data.py <game> <scenario>game_configs.jsonheader_rowfootnote_symbols list in config| Concept | Raw | Parsed | Web JSON |
|---|---|---|---|
| Unit name | values[0...n] | unit_leader | name |
| Manpower | values[n] | manpower_value | manpowerValue |
| Hex | values[n] | hex_location | hexLocation |
| Type | values[n] | unit_type | type |
For complete structure details, examples, and edge cases, see references/structures.md.
That file contains: