Extracts a structured concern sheet from official OpenReview reviews. Use when building concern alignment data for a paper.
Given OpenReview PDFs (reviews + meta-review), extract an official concern sheet:
ac_decision_driverspro_acceptpro_rejectac_decisive_negative_ids.verbatim quote (roughly 25 words or fewer) or near-verbatim paraphrase.raised_by with reviewer IDs.raised_by entries.fatal: undermines validity/claims in a way that blocks acceptance absent major redesign.major: substantial weakness likely to affect acceptance unless convincingly addressed.moderate: real concern but not typically blocking on its own.minor: polish-level, presentation, or "nice to have."resolved_in_rebuttal: true / false / null (if unknown).rebuttal_resolution: brief description of how it was resolved (if applicable).ac_treatment (the key field for ground truth):
decisive_blocker: AC explicitly identified this as driving the rejection.unresolved: concern was raised and not resolved in rebuttal; may or may not have been decisive.resolved: concern was raised but satisfactorily addressed in rebuttal (per AC judgment).accepted_limitation: concern stands but AC explicitly accepts it as non-blocking (e.g., cost-justified scope limitation, acknowledged but not fatal).dismissed: AC explicitly dismissed the concern as not relevant or not significant.reframed_feature: AC or reviewers reinterpreted the weakness as a positive (e.g., "trivially defensible → demonstrates a blind spot").not_mentioned: AC did not address this concern in the meta-review.unknown: insufficient information to determine AC treatment.decisive: true if this concern was THE reason (or one of the reasons) for the final decision.decisive_blocker.resolved.accepted_limitation. The concern is real and the severity stands, but it didn't block the decision.issue_type — a high-level axis orthogonal to tags:
conceptual: concerns about the core idea, contribution depth, or theoretical framework (e.g., "trivially defensible," "not fundamentally different from RAG poisoning," "no definition of unreliable source")empirical: concerns about experimental scope, additional benchmarks, missing baselines, statistical rigor (e.g., "single benchmark," "missing Claude models," "no human study")framing: concerns about how the paper positions its contribution, title-content alignment, overclaiming (e.g., "title says red-teaming but it's a static benchmark," "claims to solve X but actually addresses Y")Scan all reviews and the meta-review for specific papers or literature cited as critical to the assessment. Not all papers will have these — only extract when reviewers explicitly name papers that influenced their judgment.
For each critical reference, record:
id: CR1, CR2, ... (sequential)title: full paper title (as identifiable as possible for search matching)short_name: how reviewers refer to it (e.g., "LATS", "ToolChain*")cited_by: which reviewer(s) mentioned itrole: why it was cited — one of:
missing_comparison — reviewer says paper should have compared against thisnovelty_precedent — reviewer cites this as evidence the contribution is incrementalmethodological_basis — reviewer says this is the real foundation/prior workpositive_positioning — reviewer cites this favorably to support the papermissing_citation — reviewer flags missing citation (not necessarily critical)benchmark_precedent — reviewer compares to an existing benchmark/datasetdecisive: did this reference influence the accept/reject decision?concern_ids: which concerns reference this paper (optional)verbatim: near-verbatim quote from reviewer citing this paper (≤25 words)What to extract: Papers that reviewers argue should have been compared against, that establish the novelty baseline, or that the AC references when explaining the decision. Generic "see also" citations are not critical references.
What NOT to extract: The paper's own cited references unless a reviewer specifically argues they were inadequately addressed. Background citations mentioned in passing without bearing on the assessment.
Place the critical_references array after ac_decision_drivers and before ac_decisive_negative_ids in the output YAML.
ac_treatment. This is the most important annotation.ac_decisive_negative_ids MUST have ac_treatment: decisive_blocker and decisive: true. If there's a mismatch, re-read the meta-review and fix whichever field is wrong. The lint script (lint_concern_alignment.py) will catch this automatically.Emit YAML conforming to OfficialConcernSheet schema (see calibration/concern_alignment/schemas/official_concern_sheet.schema.yaml).
See calibration/concern_alignment/official/ for worked examples (available in local calibration data).