Ingest | Skills Pool

Skill-Datei

Ingest

Ingest papers and documents from inbox into the knowledge base. Runs the pipeline to convert PDFs via MinerU (auto-splits long PDFs), Office files (DOCX/XLSX/PPTX) via MarkItDown, extract metadata, deduplicate by DOI, and build indexes. Supports three inboxes - regular papers, theses, and general documents. Use when the user has new papers or documents to process, wants to run the pipeline, or rebuild indexes.

ZimengYuan0 Sterne15.03.2026

Beruf
Kategorien: Dokumente

Skill-Inhalt

将 inbox 中的 PDF、Office 文档（DOCX/XLSX/PPTX）或 Markdown 文件处理入库。

支持的文件格式

格式	放入目录	处理方式
`.pdf`	`data/inbox/` 或 `data/inbox-doc/`	MinerU 转 Markdown
`.docx` `.xlsx` `.pptx`	`data/inbox-doc/`	MarkItDown 转 Markdown
`.md`	任意 inbox	直接入库（跳过转换）

执行逻辑

根据用户意图选择预设：
- 入库新文档（默认）：使用 ingest 预设
- 完整处理：使用 full 预设（入库 + 内容富化 + 重建索引）
- 仅重建索引：使用 reindex 预设
- 仅内容富化：使用 enrich 预设
执行流水线命令：

Verwandte Skills

scholaraio pipeline <preset>