Use when you want an end-to-end workflow across preprocess, analysis, and reporting with clear stage gates and rerun strategy. Portable — no hardcoded paths. The user only needs to provide WoS txt files. Keywords: end-to-end, orchestrate, full pipeline, one-click workflow, 全流程, 一键运行, 编排, 分阶段执行.
Coordinate end-to-end execution from raw WoS data through final report, while keeping the processing and analysis skills decoupled. Fully portable — works on any machine.
The user provides bibliometric data in one of the supported formats. Everything else — processing, analysis, visualization, report generation — is orchestrated by this skill.
Before executing ANY stage, the agent MUST ask the user to confirm:
data/raw/,从 Stage A(解析转换)开始BERTOPIC_MODEL_PATH=<路径>,直接加载 + transform,跳过训练models/bertopic_model/ (safetensors 格式)EMBEDDING_MODEL=auto(默认)→ 自动检测数据语言,选择最佳模型:
all-MiniLM-L6-v2BAAI/bge-base-zh-v1.5sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2doc_embeddings.npy 缓存LLM_BACKEND=ollama, LLM_MODEL=<你的本地模型名>(例如 qwen2.5:14b)LLM_BACKEND=openai, LLM_URL=http://localhost:8080/v1/chat/completions, LLM_MODEL=<模型名>LLM_BACKEND=openai, LLM_URL=<接口地址>, LLM_API_KEY=<key>确认后,设置环境变量 PROJECT_ROOT 指向用户确认的根目录,所有脚本会自动使用该路径。
RESEARCH_TOPIC=<课题简称>RESEARCH_TOPIC is missing or ambiguous根据用户提供的数据格式,自动决定从哪个阶段开始:
| 数据格式 | 入口阶段 | 操作 |
|---|---|---|
WoS .txt | Stage A | 完整流程: 解析 → 合并 → 清洗 → 分析 |
| CSV/Excel 无 UT 列 | Stage A | 视为非 WoS 数据,跳过 txt 解析,直接尝试字段映射后进入 Stage B |
| CSV/Excel 含 UT + 基础列 (TI, AB, PY等) | Stage B | 复制到 data/processed/wos_merged.csv,跳过 Stage A |
| CSV/Excel 含 UT + 国家/机构列 | Stage C | 复制到 data/processed/wos_cleaned.csv,跳过 A+B |
| CSV/Excel 含 UT + 国家 + 作者列 | Stage D | 直接进行质量门检查,跳过 A+B+C |
格式检测逻辑(agent 在 Pre-flight 阶段执行):
import pandas as pd
df = pd.read_csv("user_file.csv", nrows=5) # 或 pd.read_excel
cols = set(df.columns)
has_ut = "UT" in cols
has_base = {"TI", "AB", "PY"}.issubset(cols)
has_country = any(c in cols for c in ["country_c1", "Country", "C1"])
has_author = any(c in cols for c in ["author_id", "AU", "paper_authors"])
print(f"UT={has_ut}, base={has_base}, country={has_country}, author={has_author}")
# 根据结果决定入口阶段
Same as paper-data-processing. All paths are workspace-relative.
{workspace}/
├── data/raw/ # ← User input: WoS txt files
├── data/processed/ # Processing + analysis outputs
├── reports/ # Final deliverables
│ ├── figures/
│ ├── tables/
│ └── {RESEARCH_TOPIC}_bibliometric_report.md
├── models/
├── src/
└── .github/skills/
paper-data-processing for raw data → processed artifacts.paper-processing-analysis-handoff as the mandatory stage gate between processing and analysis.paper-analysis-report for analysis → report artifacts, only after handoff returns analysis-ready.Before running ANY script at ANY stage:
paper-data-processing skill..txt → src/wos_txt_to_csv.py → data/processed/wos_merged.csvdata/processed/wos_merged.csv, data/processed/wos_cleaned.csv, data/processed/paper_authors.csv, data/processed/authors_summary.csvpaper-processing-analysis-handoff skill.analysis-ready, ready-with-risks, or blocked-*.analysis-ready or ready-with-risks.paper-analysis-report skill.data/processed/paper_topics.csv, reports/topic_info_*.csvreports/flow_matrix.csv, reports/talent_flow_sankey.htmlreports/figures/, reports/tables/reports/{RESEARCH_TOPIC}_bibliometric_report.mdreports/{RESEARCH_TOPIC}_bibliometric_report.docxskills/paper-analysis-report/scripts/report_quality_gate.py --report reports/{RESEARCH_TOPIC}_bibliometric_report.mdskills/paper-analysis-report/checklists/report-quality-checklist.mdThe orchestrator must not export a final report to a generic bibliometric_report.* name when RESEARCH_TOPIC is known. Final report paths should bind to the current topic to avoid stale-report reuse in cold-start testing.
| Gate | Check | Blocking? |
|---|---|---|
| Processing → Analysis | UT valid, required columns present, artifacts exist | Yes |
| Analysis → Reporting | Topic + mobility outputs exist, schema correct | Yes |
| Final | Report artifacts generated, key metrics summarized, report quality gate passed | Yes |
1. Copy workspace folder to new machine.
2. python -m venv .venv
3. .venv\Scripts\activate # Windows
source .venv/bin/activate # Linux/Mac
4. pip install -r requirements.txt
5. Place WoS .txt files in data/raw/
6. Ask: "从 WoS 数据开始走完整分析流程"