Fetch and normalize supported source-tutorial inputs into local, traceable text artifacts. **Trigger**: source ingest, ingest sources, normalize tutorial sources, 网页抽取, 资料归一化. **Use when**: `source-tutorial` 的 C1,需要把 `sources/manifest.yml` 中的网页/PDF/repo/docs 变成可追溯文本。 **Skip if**: source manifest 还没定,或来源尚未确认。 **Network**: required for remote URLs. **Guardrail**: 只把成功抽取的内容当作有效 source;失败来源必须落盘记录,不能默默忽略。
Goal: normalize mixed source inputs into local tutorial-ready text while preserving provenance.
sources/manifest.ymlsources/index.jsonlsources/provenance.jsonlwebpagepdfmarkdownrepodocs_sitevideosources/manifest.yml.video, use transcript-first ingestion:
transcript_locatorkind: webpage.python .codex/skills/source-ingest/scripts/run.py --workspace <ws>--workspace <dir> (required)--unit-id <U###>--inputs <semicolon-separated>--outputs <semicolon-separated>--checkpoint <C#>python .codex/skills/source-ingest/scripts/run.py --workspace <ws>Fix: