Parse PDFs and text into typed `ContentObject` elements and deterministic chunk metadata for this repo. Use when editing `pptx_gen/ingestion`, changing parser fallback behavior, adjusting provenance or PII-redaction rules, or updating ingestion schemas and tests.
Read AGENTS.md, schemas.py, parser.py, and chunker.py before editing this area.
parser.py so tests can run without the full inference stack.chunk_id in the format {doc_id}:{element_id}:{chunk_index} and locator in the format {doc_id}:page{page}.title, heading, paragraph, list_item, table, figure, caption.IngestionRequest and nested models strict with ConfigDict(extra="forbid").paragraph.