Full adversarial review at top-venue standard (RSS/CoRL/IJRR/T-RO). Graduated pressure, six attack vectors from review_guideline §3.1-3.6, mechanical verdict via metrics.verdict. Use to review, critique, audit a paper, or self-review a draft.
Invoked for full adversarial review at the standard of top robotics venues
(RSS, CoRL, IJRR, T-RO, ICRA, IROS). This is the largest and most
judgment-heavy skill in the system. It composes three sub-skills via the
Task tool:
concurrent-work-check (for §3.1 scoop detection)formalization-check (for §3.2 formalization attack)experiment-audit (for §3.5 validation attack)Maps to the VALIDATE stage of the research state machine (for self-review) or to the review-agent workflow (for external paper review).
Average reviewers score dimensions independently and average. Top reviewers trace the logical chain and find where it breaks:
SIGNIFICANCE → FORMALIZATION → CHALLENGE → APPROACH → VALIDATION
One broken link is a structural flaw that no score-averaging can compensate for. Search for breaks.
Extract the logical chain. Quick fatal-flaw scan per Appendix A.1 of
review_guideline.md:
Stop on the first fatal flaw for first-pass efficiency. If none, proceed.
Apply ALL six attack vectors systematically (§3.1-3.6). Invoke sub-skills for the deepest checks. Produce findings with severity classification.
For each previous finding, track: addressed | partially | not | regressed.
Check for NEW weaknesses introduced by revisions. Regression check on
severity.
PYTHONPATH=src python -c "
from alpha_research.tools.paper_fetch import fetch_and_extract
import json, sys
c = fetch_and_extract(sys.argv[1])
print(json.dumps({
'title': c.title,
'abstract': c.abstract,
'sections': c.sections,
'extraction_quality': c.extraction_quality.overall,
'math_preserved': c.extraction_quality.math_preserved,
}, indent=2, default=str))
" "<paper_id>"
If the artifact is a local draft, use Read directly.
Extract each link of the chain as ONE sentence each:
Run the Appendix A.1 quick scan. If any check fails with severity fatal, return early with that single finding.
concurrent-work-check via Task toolInvoke formalization-check via Task tool. Import its findings as
serious weaknesses if formalization level is prose_only at a top venue
or absent at any venue.
t12t15Invoke experiment-audit via Task tool. Fold its findings in:
Per review_guideline.md §1.1, before attacking you must construct the STRONGEST version of the paper's argument. Write 3-5 sentences that re-express the paper's position "so clearly, vividly, and fairly that the authors would say 'I wish I'd put it that way'" (RSS).
This is not optional. A review without a substantive steel-man is an unfair review.
For each finding produced in Iteration 2, classify:
severity: "fatal" | "serious" | "minor"attack_vector: "3.1" | "3.2" | ... | "3.6"what_is_wrong: strwhy_it_matters: strwhat_would_fix_it: strfalsification: str — "If the authors showed X, this critique would
be invalidated"grounding: str — specific section/figure/equation referencefixable: bool — can this be addressed in a revision?maps_to_trigger: "t2" | "t4" | ... | "t15" | nullVague findings are PROHIBITED. Every finding must be specific, grounded, and falsifiable.
PYTHONPATH=src python -c "
from alpha_research.metrics.verdict import compute_verdict
from alpha_research.models.review import Finding, Severity
from alpha_research.models.blackboard import Venue
import json, sys
findings_json = json.loads(sys.argv[1])
findings = [Finding(**f) for f in findings_json]
venue = Venue[sys.argv[2]]
significance_score = int(sys.argv[3])
verdict = compute_verdict(findings, venue=venue, significance_score=significance_score)
print(json.dumps({'verdict': verdict.value if hasattr(verdict, 'value') else str(verdict)}))
" '<findings_json>' RSS 3
The verdict is computed per review_plan.md §1.9:
DO NOT form a gestalt judgment. Use the mechanical output.
PYTHONPATH=src python -c "
from alpha_research.records.jsonl import append_record
from pathlib import Path
import json, sys
rid = append_record(Path(sys.argv[1]), 'review', json.loads(sys.stdin.read()))
print(rid)
" "<project_dir>" <<< '<review_json>'
{
"artifact_id": "arxiv:2501.12345",
"venue": "RSS",
"iteration": 2,
"chain_extraction": {
"task": "...",
"problem": "...",
"challenge": "...",
"approach": "...",
"contribution": "...",
"chain_complete": true,
"broken_links": []
},
"steel_man": "The paper's central insight is that ... This is non-obvious because ... The experimental result on X genuinely demonstrates ...",
"findings": {
"fatal": [],
"serious": [
{
"severity": "serious",
"attack_vector": "3.5",
"what_is_wrong": "Only 6 trials per condition reported in Table 2",
"why_it_matters": "Below RSS threshold of 20; variance estimates are unreliable at this sample size",
"what_would_fix_it": "Rerun with 20+ trials per condition and report 95% CI",
"falsification": "If authors show CI [.55,.80] after rerunning with n=20, this concern is addressed",
"grounding": "Table 2, §5.1",
"fixable": true,
"maps_to_trigger": null
}
],
"minor": []
},
"verdict": "weak_reject",
"confidence": 4,
"questions_for_authors": [
"Please provide trial counts and CI for Table 2 results.",
"Did you compare against RT-2 fine-tuned on your task? If not, why not?"
],
"what_would_increase_score": "Address the missing RT-2 baseline AND provide ≥20 trials per condition with CI. Both fixes are feasible in a rebuttal period.",
"anti_patterns_avoided": ["dimension_averaging", "false_balance", "novelty_fetishism"]
}
You CAN assess with high confidence:
You CANNOT assess:
formalization-check and flagAnti-patterns to avoid (review_guideline.md §5.4):
guidelines/doctrine/review_guideline.md Part III — attack vectors §3.1-3.6 (primary)guidelines/doctrine/review_guideline.md Part IV — venue calibrationguidelines/doctrine/review_guideline.md §5.4 — anti-patternsguidelines/spec/review_plan.md §1 — executable metrics for every findingguidelines/spec/review_plan.md §1.9 — verdict computation rulesguidelines/spec/review_plan.md §3 — graduated pressure protocolskills/concurrent-work-check/SKILL.md — sub-skillskills/formalization-check/SKILL.md — sub-skillskills/experiment-audit/SKILL.md — sub-skill