Name: Idea Creator
Author: wanshuiyin

SkillsPool

搵技能.../

Idea Creator | Skills Pool

gpt-5.4

o3

gpt-4o

if research-wiki/query_pack.md exists AND is less than 7 days old:
    Read query_pack.md and use it as initial landscape context:
    - Treat listed gaps as priority search seeds
    - Treat failed ideas as a banlist (do NOT regenerate similar ideas)
    - Treat top papers as known prior work (do not re-search them)
    Still run Phase 1 below for papers from the last 3-6 months (wiki may be stale)
else if research-wiki/ exists but query_pack.md is stale or missing:
    python3 tools/research_wiki.py rebuild_query_pack research-wiki/
    Then read query_pack.md as above

mcp__codex__codex:
  model: REVIEWER_MODEL
  config: {"model_reasoning_effort": "xhigh"}
  prompt: |
    You are a senior ML researcher brainstorming research ideas.

    Research direction: [user's direction]

    Here is the current landscape:
    [paste landscape map from Phase 1]

    Key gaps identified:
    [paste gaps from Phase 1]

    Generate 8-12 concrete research ideas. For each idea:
    1. One-sentence summary
    2. Core hypothesis (what you expect to find and why)
    3. Minimum viable experiment (what's the cheapest way to test this?)
    4. Expected contribution type: empirical finding / new method / theoretical result / diagnostic
    5. Risk level: LOW (likely works) / MEDIUM (50-50) / HIGH (speculative)
    6. Estimated effort: days / weeks / months

    Prioritize ideas that are:
    - Testable with moderate compute (8x RTX 3090 or less)
    - Likely to produce a clear positive OR negative result (both are publishable)
    - Not "apply X to Y" unless the application reveals genuinely surprising insights
    - Differentiated from the 10-15 papers above

    Be creative but grounded. A great idea is one where the answer matters regardless of which way it goes.

Novelty check: Use the /novelty-check workflow (multi-source search + GPT-5.4 cross-verification) for each idea

Critical review: Use GPT-5.4 via mcp__codex__codex-reply (same thread):

Here are our top ideas after filtering:
[paste surviving ideas with novelty check results]

For each, play devil's advocate:
- What's the strongest objection a reviewer would raise?
- What's the most likely failure mode?
- How would you rank these for a top venue submission?
- Which 2-3 would you actually work on?

Combine rankings: Merge your assessment with GPT-5.4's ranking. Select top 2-3 ideas for pilot experiments.

Design pilots: For each top idea, define the minimal experiment that would give a positive or negative signal:
- Single seed, small scale (e.g., small dataset subset, fewer epochs)
- Target: 30 min - PILOT_MAX_HOURS per pilot on 1 GPU
- Estimate GPU-hours BEFORE launching. If estimated time > PILOT_MAX_HOURS, reduce scale (fewer epochs, smaller subset) or flag as "needs manual pilot"
- Clear success metric defined upfront (e.g., "if metric improves by > 1%, signal is positive")
Deploy in parallel: Use /run-experiment to launch pilots on different GPUs simultaneously:
```
GPU 0: Pilot for Idea 1
GPU 1: Pilot for Idea 2
GPU 2: Pilot for Idea 3
```
Use run_in_background: true to launch all at once.
Collect results: Use /monitor-experiment to check progress. If any pilot exceeds PILOT_TIMEOUT_HOURS, kill it and collect partial results. Once all pilots complete (or timeout), compare:
- Which ideas showed positive signal?
- Which showed null/negative results? (eliminate or deprioritize)
- Any surprising findings that suggest a pivot?
- Total GPU-hours consumed (track against MAX_TOTAL_GPU_HOURS budget)
Re-rank based on empirical evidence: Update the idea ranking using pilot results. An idea with strong pilot signal jumps ahead of a theoretically appealing but untested idea.

# Research Idea Report

**Direction**: [user's research direction]
**Generated**: [date]
**Ideas evaluated**: X generated → Y survived filtering → Z piloted → W recommended

## Landscape Summary
[3-5 paragraphs on the current state of the field]

## Recommended Ideas (ranked)

### Idea 1: [title]
- **Hypothesis**: [one sentence]
- **Minimum experiment**: [concrete description]
- **Expected outcome**: [what success/failure looks like]
- **Novelty**: X/10 — closest work: [paper]
- **Feasibility**: [compute, data, implementation estimates]
- **Risk**: LOW/MEDIUM/HIGH
- **Contribution type**: empirical / method / theory / diagnostic
- **Pilot result**: [POSITIVE: metric +X% / NEGATIVE: no signal / SKIPPED: needs GPU]
- **Reviewer's likely objection**: [strongest counterargument]
- **Why we should do this**: [1-2 sentences]

### Idea 2: [title]
...

## Eliminated Ideas (for reference)
| Idea | Reason eliminated |
|------|-------------------|
| ... | Already done by [paper] |
| ... | Requires > 1 week GPU time |
| ... | Result wouldn't be interesting either way |

## Pilot Experiment Results
| Idea | GPU | Time | Key Metric | Signal |
|------|-----|------|------------|--------|
| Idea 1 | GPU 0 | 45 min | +2.3% CE | POSITIVE |
| Idea 2 | GPU 1 | 30 min | -0.1% CE | NEGATIVE |
| Idea 3 | GPU 2 | 1.5 hr | +0.8% CE | WEAK POSITIVE |

## Suggested Execution Order
1. Start with Idea 1 (positive pilot signal, lowest risk)
2. Idea 3 as backup (weak signal, may need larger scale to confirm)
3. Idea 2 eliminated by pilot — negative result documented

## Next Steps
- [ ] Scale up Idea 1 to full experiment (multi-seed, full dataset)
- [ ] If confirmed, invoke /auto-review-loop for full iteration

if research-wiki/ exists:
    for each idea in recommended_ideas + eliminated_ideas:
        1. Create page: research-wiki/ideas/<idea_id>.md
           - node_id: idea:<id>
           - stage: proposed (or: piloted, archived)
           - outcome: unknown (or: negative, mixed, positive)
           - based_on: [paper:<slug>, ...]
           - target_gaps: [gap:<id>, ...]
           - Include: hypothesis, proposed method, expected outcome
           - If pilot was run: actual outcome, failure notes, reusable components

        2. Add edges:
           python3 tools/research_wiki.py add_edge research-wiki/ --from "idea:<id>" --to "paper:<slug>" --type inspired_by --evidence "..."
           python3 tools/research_wiki.py add_edge research-wiki/ --from "idea:<id>" --to "gap:<id>" --type addresses_gap --evidence "..."

    Rebuild query pack:
        python3 tools/research_wiki.py rebuild_query_pack research-wiki/
    Log:
        python3 tools/research_wiki.py log research-wiki/ "idea-creator wrote N ideas (M recommended, K eliminated)"

/idea-creator "direction"     → ranked ideas
/novelty-check "top idea"     → deep novelty verification (already done in Phase 4, but user can re-run)
/research-review "top idea"   → external critical feedback
implement                     → write code
/run-experiment               → deploy to GPU
/auto-review-loop             → iterate until submission-ready

Idea Creator

Research Idea Creator

Overview

Constants

Idea Creator

Research Idea Creator

Overview

Constants

Workflow

Phase 0: Load Research Wiki (if active)

Phase 1: Landscape Survey (5-10 min)

Phase 2: Idea Generation (brainstorm with external LLM)

Phase 3: First-Pass Filtering

Phase 4: Deep Validation (for top ideas)

Phase 5: Parallel Pilot Experiments (for top 2-3 ideas)

Phase 6: Output — Ranked Idea Report

Phase 7: Write Ideas to Research Wiki (if active)

Output Protocols

Key Rules

Composing with Other Skills

Review Tracing

Goplaces

Research Ops

Editor

Fact Checker

Deep Research

Academic Researcher