Name: Paper Writing Bench
Author: Ar9av

Paper Writing Bench

Reverse-engineer raw materials (Sparse idea, Dense idea, experimental log) from an existing AI research paper to build a benchmark case for evaluating paper-writing pipelines. Replicates the PaperWritingBench dataset construction procedure from arXiv:2604.05018 §3 / App. C. TRIGGER when the user asks to "build a benchmark case from this paper", "reverse-engineer raw materials", or "evaluate my pipeline against PaperWritingBench".

Ar9av140 Sterne09.04.2026

Beruf
Kategorien: Data Engineering

PaperWritingBench (§3)

Faithful implementation of the PaperWritingBench dataset construction procedure from PaperOrchestra (Song et al., 2026, arXiv:2604.05018, §3 and App. C, F.2).

The original benchmark contains 200 papers (100 CVPR 2025 + 100 ICLR 2025). For each paper, the authors reverse-engineer the (I, E) tuple by stripping narrative flow from the original PDF using the three prompts in App. F.2. You can use this skill to reverse-engineer your own benchmark cases from any paper PDF.

What this skill does

Given an existing AI research paper (PDF or markdown extract), produce:

idea.md (Sparse variant) — high-level concept note, no math, no experimental results
idea.md (Dense variant) — detailed technical proposal with LaTeX equations and variable definitions, but still no experimental results
experimental_log.md — exhaustive raw experimental setup, numeric data, and qualitative observations, with all narrative references stripped

These three files form a complete (I, E) input pair for the paper-orchestra pipeline. You can then run the pipeline and compare its output to the original paper using .

PaperWritingBench (§3)

Faithful implementation of the PaperWritingBench dataset construction procedure from PaperOrchestra (Song et al., 2026, arXiv:2604.05018, §3 and App. C, F.2).

What this skill does

Given an existing AI research paper (PDF or markdown extract), produce:

idea.md (Sparse variant) — high-level concept note, no math, no experimental results
idea.md (Dense variant) — detailed technical proposal with LaTeX equations and variable definitions, but still no experimental results
experimental_log.md — exhaustive raw experimental setup, numeric data, and qualitative observations, with all narrative references stripped

These three files form a complete (I, E) input pair for the paper-orchestra pipeline. You can then run the pipeline and compare its output to the original paper using .

Paper Writing Bench

PaperWritingBench (§3)

What this skill does

Paper Writing Bench

PaperWritingBench (§3)

What this skill does

Inputs

Outputs

Workflow

Clickhouse Io

Clickhouse Io

Claude Devfleet

Clickhouse Io

Ai First Engineering

Postgres Patterns