Multi-dimensional screening of research ideas: novelty check + venue reviewer simulation + strategic fit assessment. Use when user says "screen ideas", "evaluate ideas", "review ideas", "novelty check", "查新", "筛选idea", or wants to rank and filter research ideas before committing to execution.
Screen and rank research ideas: $ARGUMENTS
gpt-5.4 — Model used via Codex MCP. Must be an OpenAI model (e.g., gpt-5.4, o3, gpt-4o).ICML — Default target venue when none is specified.{novelty: 0.25, venue: 0.35, strategic: 0.20, feasibility: 0.20} — Weights for the composite score. Overridable via -- weights:.7.0 — Composite score at or above this triggers a PROCEED recommendation.5.0 — Composite score between this and PROCEED_THRESHOLD triggers PROCEED WITH CAUTION. Below this triggers ABANDON.This skill combines three evaluation modules to screen research ideas before the researcher commits time and resources to execution. Each idea passes through all three modules, producing a composite score and a ranked recommendation.
novelty-check skill. Systematically verifies whether each idea's core claims are genuinely novel against recent literature.The final output is a ranked list of ideas with per-idea breakdowns, composite scores, and actionable recommendations.
$ARGUMENTS — Either:
outputs/IDEAS_FILTERED.md from /idea-gen)-- venue: directive — Target venue for Module B. Valid values: ICML, VLDB, NeurIPS, all. Default: ICML.-- weights: directive — Override composite weights (e.g., -- weights: novelty=0.3, venue=0.3, strategic=0.2, feasibility=0.2).outputs/LANDSCAPE.json — If available from a prior /lit-survey run, read it for novelty cross-referencing. If not found, skip silently.$ARGUMENTS points to a file path, read and parse that file. Extract each idea as a separate entity (look for headings, numbered lists, or ### Idea N: patterns).$ARGUMENTS is inline text, treat each paragraph or clearly delimited section as a separate idea.-- venue: from the arguments. If absent, set TARGET_VENUE = DEFAULT_VENUE.-- weights: from the arguments. If absent, use COMPOSITE_WEIGHTS.outputs/LANDSCAPE.json. If found, load the paper list for cross-referencing in Module A.Adapted from the ARIS novelty-check skill. For EACH idea, execute Phases A through D.
For EACH core claim, search using ALL available sources:
WebSearch):
LANDSCAPE.json: If outputs/LANDSCAPE.json was loaded, check whether any papers already found in Stage 1 (lit-survey) overlap with the current claim. Skip re-fetching those — use the cached metadata. Flag any overlapping papers for detailed comparison.WebFetch to retrieve the abstract and (where possible) the related work or introduction section. This is critical for determining whether the overlap is superficial or fundamental.Search failure handling:
Call REVIEWER_MODEL via Codex MCP (mcp__codex__codex) with xhigh reasoning effort:
mcp__codex__codex:
config: {"model_reasoning_effort": "xhigh"}
prompt: |
I need to verify the novelty of a research idea.
Proposed idea: [IDEA DESCRIPTION]
Papers found that may overlap:
[LIST EACH PAPER: title, authors, year, venue, abstract summary]
Core claims to verify:
[LIST EACH CLAIM]
For each core claim, answer THREE questions:
1. Has this EXACT mechanism been published? (cite specific paper + section if yes)
2. Has a CLOSELY RELATED mechanism been published that achieves the same goal through a different path? (cite + explain degree of overlap)
3. Would a reviewer at [TARGET_VENUE] consider this sufficiently novel? (yes/no + reasoning)
Overall novelty assessment:
- Score: 0-10 (where 10 = completely unprecedented, 0 = already published verbatim)
- Recommendation: PROCEED / PROCEED WITH CAUTION / ABANDON
- Key differentiator (what, if anything, makes this unique)
- Suggested positioning to maximize novelty perception at the target venue
Produce the following structured report for each idea:
### Novelty: [Idea Title]
- **Score**: X/10
- **Recommendation**: PROCEED / PROCEED WITH CAUTION / ABANDON
- **Core Claims**:
1. [Claim 1] — Novelty: HIGH/MEDIUM/LOW — Closest: [paper title, year]
2. [Claim 2] — Novelty: HIGH/MEDIUM/LOW — Closest: [paper title, year]
3. [Claim 3] — Novelty: HIGH/MEDIUM/LOW — Closest: [paper title, year]
- **Closest Prior Work**:
| Paper | Year | Venue | Overlap | Key Difference |
|-------|------|-------|---------|----------------|
| ... | ... | ... | ... | ... |
- **Key differentiator**: [what makes this unique, if anything]
- **Suggested positioning**: [how to frame the contribution to maximize novelty perception]
This is the key innovation of the screening skill. It adapts the ICML/VLDB multi-reviewer prompt system to evaluate ideas (not finished papers), answering the core question: "If this idea is executed correctly, would the resulting paper be accepted at the target venue?"
venue-profiles/{VENUE}.md (e.g., venue-profiles/ICML.md). If the file does not exist, fall back to the generic profile below.-- venue: ICML → use ICML profile.-- venue: VLDB → use VLDB profile.-- venue: NeurIPS → use NeurIPS profile.-- venue: all → run against ALL available venue profiles, produce comparative results.DEFAULT_VENUE (ICML).If no venue profile file is found, use this generic "top ML venue" profile:
Calibration Tiers:
Reviewer Profiles:
Verdict Options: Strong Reject, Reject, Weak Reject, Weak Accept, Accept, Strong Accept
For EACH idea, call the external LLM:
mcp__codex__codex:
model: REVIEWER_MODEL
config: {"model_reasoning_effort": "xhigh"}
prompt: |
你将模拟 [VENUE_NAME] ([VENUE_FULL_NAME]) 的审稿委员会。
你评审的是一个**研究 Idea**(不是完成的论文)。核心问题是:
"如果这个 idea 被正确执行,产出的论文能发 [VENUE_NAME] 吗?"
## 评审校准标准
[INJECT CALIBRATION TIERS FROM VENUE PROFILE]
## 审稿人画像
[INJECT REVIEWER PROFILES FROM VENUE PROFILE]
=== IDEA ===
Title: [title]
Thesis: [one-sentence thesis]
Problem: [gap addressed]
Core Mechanism: [key technical insight]
Contribution Type: [empirical/method/theory/diagnostic]
Closest Work: [paper + delta, from Module A]
Novelty Score: [X/10, from Module A]
=== END IDEA ===
请为每位审稿人输出:
1. **校准层级**: Tier 1/2/3,附理由
2. **Strengths**: 从该审稿人视角出发的优点(至少 2 个具体点)
3. **Critical Weaknesses**: 2-3 个具体、可操作的弱点(不要泛泛而谈)
4. **Verdict**: [从 verdict_options 中选择: Strong Reject / Reject / Weak Reject / Weak Accept / Accept / Strong Accept]
5. **"怎样才能让我给 Accept"**: 1-2 句话,告诉作者具体需要什么
然后写 **Meta Review**:
- 审稿人之间的核心争议(如果有)—— 审稿人之间应该有分歧,不要三人一致
- 最终裁决: [从 verdict_options 选择]
- 如果拒稿: 这个 idea 适合什么级别的会议?(e.g., "适合 AAAI/IJCAI" 或 "建议转投 Workshop")
- 如果接收: 怎样才能冲击 Best Paper?
- 执行中的 Top 3 风险(技术风险、实验风险、定位风险)
Codex MCP failure handling: If mcp__codex__codex is unavailable:
Convert each reviewer's verdict to a numeric score:
| Verdict | Score |
|---|---|
| Strong Reject | 1 |
| Reject | 3 |
| Weak Reject | 4 |
| Weak Accept | 6 |
| Accept | 8 |
| Strong Accept | 10 |
Venue Score = average of 3 reviewer verdict scores (rounded to 1 decimal place).
This module deepens Prof. Bingsheng He's research ideation framework into a quantified 5-dimension evaluation. Claude performs this assessment directly — no external LLM call needed.
For EACH idea, evaluate on these 5 dimensions (1-10 each):
Is the core problem persistent, or is it a transient fad?
| Score Range | Meaning |
|---|---|
| 1-3 | Likely obsolete within 1 year (e.g., tied to a specific model version or API) |
| 4-6 | Relevant for 2-3 years (e.g., current architectural paradigm) |
| 7-10 | Addresses a fundamental, long-standing problem (e.g., generalization, efficiency, interpretability) |
Ask: "Will researchers still care about this problem in 5 years?"
Can this idea grow into a multi-paper research arc?
| Score Range | Meaning |
|---|---|
| 1-3 | One-off finding, no natural follow-up work |
| 4-6 | One clear extension paper possible |
| 7-10 | Opens a new sub-area; 3+ papers are naturally achievable (foundational contribution → extensions → system/application) |
Ask: "After Paper 1, what are Papers 2 and 3?"
Does the idea connect to real-world needs?
| Score Range | Meaning |
|---|---|
| 1-3 | Pure theoretical curiosity with no foreseeable application |
| 4-6 | Benchmark-only demonstration (e.g., improves CIFAR-10 accuracy) |
| 7-10 | Clear industry or societal application (e.g., healthcare, sustainability, production ML systems) |
Ask: "Who outside academia would care about this result?"
Does the researcher (or team) have a unique advantage in executing this idea?
| Score Range | Meaning |
|---|---|
| 1-3 | Any competent team could do this equally well; high risk of being scooped |
| 4-6 | Moderate advantage (e.g., some relevant prior work, partial infrastructure) |
| 7-10 | Strong unique position (e.g., proprietary data, unique computational resources, rare domain expertise, established collaboration) |
Ask: "Why should THIS team pursue this, rather than a team at Google/DeepMind/FAIR?"
How fast is the experimental feedback loop?
| Score Range | Meaning |
|---|---|
| 1-3 | Each iteration takes weeks (e.g., large-scale pre-training, human studies) |
| 4-6 | Each iteration takes days (e.g., medium-scale training, moderate compute) |
| 7-10 | Iterations within hours; rapid signal on whether the idea works (e.g., small diagnostic experiments, existing benchmarks, fast prototyping) |
Ask: "How quickly can we know if this idea is working or dead?"
Strategic Score = average of all 5 dimension scores (rounded to 1 decimal place).
### Strategic Fit: [Idea Title]
- **Strategic Score**: X.X/10
- **Dimensions**:
| Dimension | Score | Justification |
|-----------|-------|---------------|
| Longevity | X/10 | [1-2 sentences] |
| Roadmap Viability | X/10 | [1-2 sentences] |
| Application Grounding | X/10 | [1-2 sentences] |
| Execution Uniqueness | X/10 | [1-2 sentences] |
| Iteration Readiness | X/10 | [1-2 sentences] |
- **Strategic recommendation**: [1-2 sentences on whether this is a good bet for the researcher]
After all 3 modules complete for each idea, compute the composite score:
COMPOSITE = (
COMPOSITE_WEIGHTS.novelty * Novelty_Score + # 0-10 from Module A
COMPOSITE_WEIGHTS.venue * Venue_Score + # 0-10 from Module B
COMPOSITE_WEIGHTS.strategic * Strategic_Score + # 0-10 from Module C
COMPOSITE_WEIGHTS.feasibility * Feasibility_Score # 0-10 carried from idea-gen
)
Where:
/idea-gen output. If not available (e.g., ideas were provided directly), Claude estimates feasibility on a 0-10 scale based on: computational requirements, data availability, timeline, and implementation complexity.Default weights: novelty=0.25, venue=0.35, strategic=0.20, feasibility=0.20
Override with: -- weights: novelty=0.3, venue=0.3, strategic=0.2, feasibility=0.2
| Composite Score | Recommendation | Action |
|---|---|---|
| >= 7.0 (PROCEED_THRESHOLD) | PROCEED | Move to /idea-refine for detailed development |
| 5.0 - 6.9 (CAUTION to PROCEED range) | PROCEED WITH CAUTION | Address specific weaknesses first; consider /lit-survey on flagged sub-topics |
| < 5.0 (below CAUTION_THRESHOLD) | ABANDON | Document for future reference; do not invest further effort |
All-ABANDON fallback: If ALL ideas score below CAUTION_THRESHOLD (5.0):
outputs/LANDSCAPE.json if available.When screening multiple ideas, Module A for different ideas can run concurrently. The constraint is: for a single idea, Module A must finish before Module B starts (since Module B's prompt includes the novelty score and closest prior work from Module A).
Create the outputs/ directory if it does not exist:
mkdir -p outputs
outputs/SCREENING_REPORT.mdFull detailed report with per-idea breakdown across all 3 modules.
# Screening Report
**Direction**: [research direction]
**Venue**: [target venue]
**Date**: [YYYY-MM-DD]
**Ideas screened**: N
**Composite weights**: novelty=X, venue=X, strategic=X, feasibility=X
## Executive Summary
[2-3 paragraphs summarizing the screening results. How many ideas passed? What are the top recommendations? Any surprises?]
## Per-Idea Reports
### Idea 1: [Title] — [PROCEED/CAUTION/ABANDON]
#### Module A: Novelty Assessment
[Full Phase D novelty report]
#### Module B: Venue Reviewer Simulation ([VENUE])
[Full 3-reviewer + meta-review output]
#### Module C: Strategic Fit Assessment
[Full 5-dimension strategic report]
#### Composite Score
| Component | Score | Weight | Weighted |
|-----------|-------|--------|----------|
| Novelty | X/10 | 0.25 | X.XX |
| Venue | X/10 | 0.35 | X.XX |
| Strategic | X/10 | 0.20 | X.XX |
| Feasibility | X/10 | 0.20 | X.XX |
| **Composite** | | | **X.XX** |
**Recommendation**: PROCEED / PROCEED WITH CAUTION / ABANDON
---
### Idea 2: [Title] — [PROCEED/CAUTION/ABANDON]
[repeat structure]
---
[repeat for all ideas]
outputs/SCREENING_RANKED.mdConcise ranked summary for quick reference and handoff to downstream skills.
# Screening Results: Ranked Ideas
**Direction**: [direction]
**Venue**: [venue]
**Date**: [YYYY-MM-DD]
**Ideas screened**: N
## Rankings
| Rank | Idea | Novelty | Venue Score | Strategic | Feasibility | Composite | Recommendation |
|------|------|---------|-------------|-----------|-------------|-----------|----------------|
| 1 | ... | 8.5 | 7.2 | 8.0 | 7.5 | 7.8 | PROCEED |
| 2 | ... | 7.0 | 6.8 | 7.5 | 8.0 | 7.2 | PROCEED |
| 3 | ... | 6.0 | 5.5 | 6.0 | 7.0 | 6.0 | CAUTION |
| 4 | ... | 4.0 | 3.5 | 5.0 | 6.0 | 4.4 | ABANDON |
## Detailed Per-Idea Reports
### Rank 1: [Title] — PROCEED
#### Module A: Novelty
- Score: X/10
- Key differentiator: [what makes it unique]
- Closest prior work: [paper, year, delta]
#### Module B: Venue Simulation ([VENUE])
- Reviewer 1 ([persona]): [Verdict] — [1-line summary]
- Reviewer 2 ([persona]): [Verdict] — [1-line summary]
- Reviewer 3 ([persona]): [Verdict] — [1-line summary]
- Meta-review: [Final verdict] — [1-line summary]
- Top risk: [the single biggest execution risk]
#### Module C: Strategic Fit
- Longevity: X/10 — [1 line]
- Roadmap Viability: X/10 — [1 line]
- Application Grounding: X/10 — [1 line]
- Execution Uniqueness: X/10 — [1 line]
- Iteration Readiness: X/10 — [1 line]
---
### Rank 2: [Title] — PROCEED
[repeat structure]
---
[repeat for all ideas, in rank order]
## Next Steps
### For PROCEED ideas:
- Run `/idea-refine` to develop detailed research plans, experimental designs, and paper outlines.
### For PROCEED WITH CAUTION ideas:
- Run `/lit-survey` on the specific sub-topics flagged as weak by the reviewers.
- Address the critical weaknesses identified in Module B before proceeding.
- Re-screen after improvements.
### For ABANDON ideas:
- Documented here for future reference.
- May revisit if the landscape changes (new tools, new datasets, paradigm shifts).
- Consider whether a sub-component of the idea could be extracted and developed independently.
If Write fails due to file size, fall back to Bash with a heredoc:
cat << 'SCREENING_EOF' > outputs/SCREENING_REPORT.md
[content]
SCREENING_EOF
/idea-gen output, estimate it based on: computational requirements, data availability, timeline to first results, and implementation complexity.Write fails, use Bash with a heredoc. The screening report can be long for multiple ideas./lit-survey → /idea-gen → /idea-screen ← you are here → /idea-refine
/lit-survey: outputs/LANDSCAPE.json — paper database for novelty cross-referencing./idea-gen: outputs/IDEAS_FILTERED.md — candidate ideas with feasibility scores./idea-refine: outputs/SCREENING_RANKED.md — ranked ideas with detailed assessments, ready for refinement.The screening skill is the critical quality gate in the pipeline. Its purpose is to prevent the researcher from investing weeks into an idea that is either not novel, would not survive peer review, or is strategically unsound. Be rigorous. Be honest. Save the researcher's time.