Automated peer review of research papers using Shadow-LEGO's self-improving cascade. Multiple fictional reviewer personas score papers against arxiv exemplars on standard rubrics (soundness, novelty, clarity, significance, presentation). The cascade learns from teacher reviews to produce increasingly accurate local reviews at zero marginal cost.
STOP. READ THIS ENTIRE SKILL.MD BEFORE CALLING ANY ENDPOINT.
Automated peer review using Shadow-LEGO's 4-tier self-improving cascade.
This skill generates multi-perspective peer reviews of research papers by:
| Tier | What | Tool | Cost | Latency |
|---|
| 0 | Heuristic checks: word count, citation count, LaTeX errors, section balance, forbidden phrases | Rules in review_heuristics.py | Free | μs |
| 0.5 | Section quality classifier: pass/fail on structure, grounding, tone | DistilBERT via /create-classifier | Free | ~15ms |
| 1.5 | Reviewer persona GPT: detailed critique with rubric scores and rationale | QLoRA Qwen2.5-1.5B via /create-gpt | Free | ~300ms |
| 2 | Teacher review: full multi-aspect SWIF2T-style feedback | /scillm batch.py | Paid | 3-8s |
Every local review runs in shadow mode alongside the teacher:
shadow.jsonl per reviewer personaPapers are compared against top-cited arxiv exemplars:
4 fictional reviewers in personas/reviewers.yaml:
Each reviewer scores on 6 dimensions (1-4 scale, NeurIPS/ICLR convention):
Overall recommendation: decoupled 1-10 holistic judgment (ICLR convention). The weighted signal from dimension scores is advisory — reviewers make holistic calls (e.g., moderate novelty + immense significance = accept).
Accept/reject thresholds:
The skill embeds an arxiv benchmark corpus (data/arxiv_benchmarks.json) that
provides empirical distributions for structural metrics. Every review command
automatically runs a benchmark eval and attaches it to the meta-review, so the
Tier 2 reviewer (project agent) has percentile rankings for scoring decisions.
| Metric | Source | What it measures |
|---|---|---|
eq_per_pg | Equation environments / pages | Mathematical rigor |
fig_per_pg | Figure environments / pages | Visual communication |
ref_per_pg | References / pages | Citation density |
footnotes | \footnote{} count | Scholarly depth |
tables | Table environments | Data presentation |
Papers are scored by percentile rank against the corpus: weak (<p25), ok (p25-p75), strong (>p75).
# Add a paper from local .tex file
bash run.sh ingest-benchmark 2305.05176 --tex /path/to/paper.tex
# Score your paper against the corpus
bash run.sh benchmark paper_output/sensai_cascade/draft.tex
# The corpus auto-grows as /dogpile discovers new arxiv papers
# Full peer review of a paper
bash run.sh review artifacts/neophyte_paper_v5/
# Benchmark eval: score paper against arxiv corpus
bash run.sh benchmark paper_output/draft.tex
# Add paper to benchmark corpus
bash run.sh ingest-benchmark 2305.05176 --tex path/to/paper.tex
# Compare against arxiv exemplars
bash run.sh compare-exemplars artifacts/neophyte_paper_v5/ --papers 2305.05176,2411.05844
# Score a single section
bash run.sh score-section artifacts/neophyte_paper_v5/sections/eval.tex --reviewer alpha
# Train reviewer GPTs from shadow data
bash run.sh train --reviewer alpha --data ~/.pi/assistant/shadow.jsonl
# Shadow agreement report
bash run.sh shadow-report
# Generate visual review summary (via /create-figure + /analytics)
bash run.sh visualize artifacts/neophyte_paper_v5/ --output review_dashboard.png
Reviews are written to {paper_dir}/reviews/:
reviews/
├── alpha_review.json # Per-reviewer structured review
├── beta_review.json
├── gamma_review.json
├── delta_review.json
├── meta_review.json # Aggregated scores + decision
# Also includes benchmark_eval percentiles and pen_name_violations
├── exemplar_comparison.json
├── shadow.jsonl # Shadow observation log
└── figures/
├── rubric_radar.png # Radar chart of scores per reviewer
├── exemplar_gap.png # Gap analysis vs exemplars
└── score_history.png # Score convergence across drafts
Draft N → 4 reviewer personas score (shadow mode)
→ Teacher confirms/overrides
→ Disagreements logged to shadow.jsonl
→ When agreement ≥90%: reviewer GPT promoted
→ Next draft: reviewer GPT scores autonomously (free, on-device)
→ /create-figure generates visual diff between drafts
Papers reviewed by /create-peer-review MUST use pen names for all persona references. Real people never appear in papers.
| Real Persona | Pen Name | Role in Papers |
|---|---|---|
| Brandon Bailey | (use pen name from personas.yaml) | Security/SPARTA analysis |
| Margaret Chen | (use pen name from personas.yaml) | Compliance/extraction |
| Jennifer Cheung | (use pen name from personas.yaml) | RMF/DISA validation |
| Embry Lawson | (use pen name from personas.yaml) | System architecture |
| Graham Anderson | Graham Anderson | Human architect (real person, OK as author) |
The reviewer personas (Alpha/Tanaka, Beta/Webb, Gamma/Krishnamurthy, Delta/Lindqvist) are already fictional and need no mapping.
/create-peer-review flags any real persona name found in paper text as a HIGH severity findingThe /create-paper skill can invoke /create-peer-review as a quality gate:
# In create-paper's multi-draft loop:
for draft_num in range(max_drafts):
generate_draft(section)
review = peer_review(section, reviewers=["alpha", "beta", "gamma", "delta"])
if review.decision == "accept":
break
revision_notes = review.aggregate_feedback()
# Feed revision notes back into next draft
/create-peer-review serves as the final quality gate in the /paper-lab convergence loop:
/paper-lab Phase 2 (headless):
Round N: /review-paper → fixes → /review-paper (converge on 8.5+)
/paper-lab Phase 3 (final gate):
/create-peer-review → 4 persona reviews + benchmark eval
Target: 8+/10 average across all 4 reviewers
| Skill | Integration |
|---|---|
/review-paper | Internal quality signal — /create-peer-review is the external simulation |
/dogpile | Future: search author's other papers for deeper reviewer profiles |
/interview | Phase 3: author resolves reviewer questions (never during automated review) |
/paper-lab | Orchestrator — calls /create-peer-review as final quality gate |