Reverse-engineer raw materials (Sparse idea, Dense idea, experimental log) from an existing AI research paper to build a benchmark case for evaluating paper-writing pipelines. Replicates the PaperWritingBench dataset construction procedure from arXiv:2604.05018 §3 / App. C. TRIGGER when the user asks to "build a benchmark case from this paper", "reverse-engineer raw materials", or "evaluate my pipeline against PaperWritingBench".
Faithful implementation of the PaperWritingBench dataset construction procedure from PaperOrchestra (Song et al., 2026, arXiv:2604.05018, §3 and App. C, F.2).
The original benchmark contains 200 papers (100 CVPR 2025 + 100 ICLR 2025). For each paper, the authors reverse-engineer the (I, E) tuple by stripping narrative flow from the original PDF using the three prompts in App. F.2. You can use this skill to reverse-engineer your own benchmark cases from any paper PDF.
Given an existing AI research paper (PDF or markdown extract), produce:
idea.md (Sparse variant) — high-level concept note, no math, no
experimental resultsidea.md (Dense variant) — detailed technical proposal with LaTeX
equations and variable definitions, but still no experimental resultsexperimental_log.md — exhaustive raw experimental setup, numeric data,
and qualitative observations, with all narrative references strippedThese three files form a complete (I, E) input pair for the paper-orchestra pipeline. You can then run the pipeline and compare its output to the original paper using .
paper-autoratersbench/<paper_id>/idea_sparse.md — Sparse variantbench/<paper_id>/idea_dense.md — Dense variantbench/<paper_id>/experimental_log.md — Experimental logFor each paper, run three independent LLM calls using the verbatim prompts