Generate and rank research ideas given a broad direction. Use when user says "找idea", "brainstorm ideas", "generate research ideas", "what can we work on", or wants to explore a research area for publishable directions.
Generate publishable research ideas for: $ARGUMENTS
Given a broad research direction from the user, systematically generate, validate, and rank concrete research ideas. This skill composes with /research-lit, /novelty-check, and /research-review to form a complete idea discovery pipeline.
.claude/settings.json 中 LLM_MODEL 配置决定(默认 qwen3.5-plus)。可切换为 gpt-5.4 / codex / deepseek-r1 等任何 OpenAI 兼容模型,修改 settings.json 即可生效。💡 Override via argument, e.g.,
/idea-creator "topic" — pilot budget: 4h per idea, 20h total.
Map the research area to understand what exists and where the gaps are.
Scan local paper library first: Check papers/ and literature/ in the project directory for existing PDFs. Read first 3 pages of relevant papers to build a baseline understanding before searching online. This avoids re-discovering what the user already knows.
Search recent literature using WebSearch:
Build a landscape map:
Identify structural gaps:
Use the external LLM via llm-chat MCP for divergent thinking:
mcp__llm-chat__chat:
# model: 由 settings.json 配置
# model 由 settings.json 配置,无需手动指定
prompt: |
You are a senior ML researcher brainstorming research ideas.
Research direction: [user's direction]
Here is the current landscape:
[paste landscape map from Phase 1]
Key gaps identified:
[paste gaps from Phase 1]
Generate 8-12 concrete research ideas. For each idea:
1. One-sentence summary
2. Core hypothesis (what you expect to find and why)
3. Minimum viable experiment (what's the cheapest way to test this?)
4. Expected contribution type: empirical finding / new method / theoretical result / diagnostic
5. Risk level: LOW (likely works) / MEDIUM (50-50) / HIGH (speculative)
6. Estimated effort: days / weeks / months
Prioritize ideas that are:
- Testable with moderate compute (8x RTX 3090 or less)
- Likely to produce a clear positive OR negative result (both are publishable)
- Not "apply X to Y" unless the application reveals genuinely surprising insights
- Differentiated from the 10-15 papers above
Be creative but grounded. A great idea is one where the answer matters regardless of which way it goes.
Save the response for use in follow-up calls (include as context in subsequent prompts).
For each generated idea, quickly evaluate:
Feasibility check: Can we actually run this experiment with available resources?
Novelty quick-check: For each idea, do 2-3 targeted searches to see if it's already been done. Full /novelty-check comes later for survivors.
Impact estimation: Would a reviewer care about the result?
Eliminate ideas that fail any of these. Typically 8-12 ideas reduce to 4-6.
For each surviving idea, run a deeper evaluation:
Novelty check: Use the /novelty-check workflow (multi-source search + REVIEWER_MODEL cross-verification) for each idea
Critical review: Use REVIEWER_MODEL via mcp__llm-chat__chat (include landscape + ideas as context):
Here are our top ideas after filtering:
[paste surviving ideas with novelty check results]
For each, play devil's advocate:
- What's the strongest objection a reviewer would raise?
- What's the most likely failure mode?
- How would you rank these for a top venue submission?
- Which 2-3 would you actually work on?
Combine rankings: Merge your assessment with REVIEWER_MODEL's ranking. Select top 2-3 ideas for pilot experiments.
Before committing to a full research effort, run cheap pilot experiments to get empirical signal. This is the key differentiator from paper-only validation.
Design pilots: For each top idea, define the minimal experiment that would give a positive or negative signal:
Deploy in parallel: Use /run-experiment to launch pilots on different GPUs simultaneously:
GPU 0: Pilot for Idea 1
GPU 1: Pilot for Idea 2
GPU 2: Pilot for Idea 3
Use run_in_background: true to launch all at once.
Collect results: Use /monitor-experiment to check progress. If any pilot exceeds PILOT_TIMEOUT_HOURS, kill it and collect partial results. Once all pilots complete (or timeout), compare:
Re-rank based on empirical evidence: Update the idea ranking using pilot results. An idea with strong pilot signal jumps ahead of a theoretically appealing but untested idea.
Note: Skip this phase if the ideas are purely theoretical or if no GPU is available. Flag skipped ideas as "needs pilot validation" in the report.
Write a structured report to IDEA_REPORT.md in the project root:
# Research Idea Report
**Direction**: [user's research direction]
**Generated**: [date]
**Ideas evaluated**: X generated → Y survived filtering → Z piloted → W recommended
## Landscape Summary
[3-5 paragraphs on the current state of the field]
## Recommended Ideas (ranked)
### Idea 1: [title]
- **Hypothesis**: [one sentence]
- **Minimum experiment**: [concrete description]
- **Expected outcome**: [what success/failure looks like]
- **Novelty**: X/10 — closest work: [paper]
- **Feasibility**: [compute, data, implementation estimates]
- **Risk**: LOW/MEDIUM/HIGH
- **Contribution type**: empirical / method / theory / diagnostic
- **Pilot result**: [POSITIVE: metric +X% / NEGATIVE: no signal / SKIPPED: needs GPU]
- **Reviewer's likely objection**: [strongest counterargument]
- **Why we should do this**: [1-2 sentences]
### Idea 2: [title]
...
## Eliminated Ideas (for reference)
| Idea | Reason eliminated |
|------|-------------------|
| ... | Already done by [paper] |
| ... | Requires > 1 week GPU time |
| ... | Result wouldn't be interesting either way |
## Pilot Experiment Results
| Idea | GPU | Time | Key Metric | Signal |
|------|-----|------|------------|--------|
| Idea 1 | GPU 0 | 45 min | +2.3% CE | POSITIVE |
| Idea 2 | GPU 1 | 30 min | -0.1% CE | NEGATIVE |
| Idea 3 | GPU 2 | 1.5 hr | +0.8% CE | WEAK POSITIVE |
## Suggested Execution Order
1. Start with Idea 1 (positive pilot signal, lowest risk)
2. Idea 3 as backup (weak signal, may need larger scale to confirm)
3. Idea 2 eliminated by pilot — negative result documented
## Next Steps
- [ ] Scale up Idea 1 to full experiment (multi-seed, full dataset)
- [ ] If confirmed, invoke /auto-review-loop for full iteration
After this skill produces the ranked report:
/idea-creator "direction" → ranked ideas
/novelty-check "top idea" → deep novelty verification (already done in Phase 4, but user can re-run)
/research-review "top idea" → external critical feedback
implement → write code
/run-experiment → deploy to GPU
/auto-review-loop → iterate until submission-ready