Paper Autoraters (App. F.3)

Faithful implementation of the four LLM-as-judge autoraters used in PaperOrchestra (Song et al., 2026, arXiv:2604.05018, §5 and App. F.3).

These are the metrics the paper uses to demonstrate that PaperOrchestra beats single-agent and AI-Scientist-v2 baselines. Use them to:

The four autoraters

Autorater	What it does	Inputs	Output
Citation F1 — P0/P1 partition	Partitions reference list into P0 (must-cite) and P1 (good-to-cite) given the paper text	one paper text + its references list	JSON `{ref_num: "P0"\|"P1"}`
Literature Review Quality	6-axis 0-100 score for Intro+Related Work, with anti-inflation hard caps

Faithful implementation of the four LLM-as-judge autoraters used in PaperOrchestra (Song et al., 2026, arXiv:2604.05018, §5 and App. F.3).

These are the metrics the paper uses to demonstrate that PaperOrchestra beats single-agent and AI-Scientist-v2 baselines. Use them to:

Autorater	What it does	Inputs	Output
Citation F1 — P0/P1 partition	Partitions reference list into P0 (must-cite) and P1 (good-to-cite) given the paper text	one paper text + its references list	JSON `{ref_num: "P0"\|"P1"}`
Literature Review Quality	6-axis 0-100 score for Intro+Related Work, with anti-inflation hard caps