Name: Idea Gen
Author: megumi-ben

Idea Gen | Skills Pool

Check for existing landscape artifacts:
- Read outputs/LANDSCAPE.json if it exists
- Read outputs/LANDSCAPE.md if it exists
- These files are produced by the /lit-survey skill
If landscape files exist:
- Verify the research direction in the landscape matches the user's current direction (fuzzy match is acceptable — e.g., "efficient transformers" matches "transformer efficiency")
- Extract the Gap Identification Matrix or equivalent gap listing from the landscape
- Extract the key papers list (titles + one-line summaries)
- Extract any open problems or future work themes
- Store these as landscape_summary, identified_gaps, and key_papers for use in Phase 2
If landscape files do NOT exist (fallback — abbreviated inline survey):
- Print a notice: "No landscape files found. Running abbreviated inline survey. For better results, run /lit-survey [direction] first."
- Run 3-5 quick WebSearch queries:
  - "[direction] survey" site:arxiv.org
  - "[direction] benchmark" NeurIPS OR ICML OR ICLR 2024 2025
  - "[direction] limitations" OR "future work"
  - "[direction]" state-of-the-art
  - One more query based on a specific sub-aspect of the direction
- For the top 5-8 results, use WebFetch to read abstracts/introductions
- Build a mini landscape map:
  - Group findings into 2-4 sub-themes
  - List 5-10 key papers (title, year, one-sentence summary)
  - Identify 3-5 gaps or open questions
- Store as landscape_summary, identified_gaps, and key_papers
Direction specificity check:
- Auto-narrowing for broad directions: If the user's direction is very broad (e.g., just "NLP" or "computer vision"), do NOT stop to ask. Instead:
  1. Identify the top 3 most promising sub-directions based on the landscape
  2. Generate ideas for each sub-direction (3-4 ideas each)
  3. Merge all ideas into a single pool and apply normal filtering
  4. Log in outputs/PIPELINE_LOG.md: "⚠️ Direction was broad, auto-narrowed to: [sub1], [sub2], [sub3]"
- A good direction is 1-2 sentences specifying the problem, domain, and constraint — e.g., "factorized gap in discrete diffusion LMs" or "sample efficiency of offline RL with image observations"
- If the direction is broad, auto-narrowing handles it autonomously; the pipeline never stops to ask

You are a senior ML/systems researcher. Your task is to generate research ideas
that are NOVEL, SPECIFIC, and EVALUATABLE.

Research direction: [user's direction from $ARGUMENTS]

Current landscape (from systematic survey):
[paste landscape_summary — either from LANDSCAPE.md or the mini-survey]

Identified gaps:
[paste identified_gaps — either the Gap Identification Matrix or the mini-survey gaps]

Generate 8-12 concrete research ideas. For each idea, provide:

1. **Title**: A concise, descriptive title (as it would appear on a paper)
2. **One-sentence thesis**: The core claim, stated as "We show that X by Y"
3. **Problem it solves**: Which specific gap from the landscape does this address?
4. **Core mechanism**: The key technical insight (not just "apply X to Y")
5. **Why it is non-obvious**: What would a skeptic's first objection be, and why is it wrong?
6. **Expected contribution type**: empirical finding / new method / theoretical result / diagnostic / new formulation
7. **Risk level**: LOW / MEDIUM / HIGH (with 1-sentence justification)
8. **Estimated effort**: person-weeks to a publishable result
9. **Closest existing work**: The single most similar paper and the precise delta

Quality criteria for generated ideas:
- REJECT "apply X to Y" unless the application reveals a genuinely surprising mechanism
- REJECT ideas where the outcome does not matter (if +3% or -3%, who cares?)
- PREFER ideas where a NEGATIVE result is equally publishable
- PREFER ideas that challenge an assumption the field takes for granted
- PREFER ideas with a clear "skeleton experiment" that takes < 1 week
- Each idea must be differentiated from the landscape papers above

Generate diverse ideas: at least 2 should be HIGH risk / high reward,
at least 2 should be LOW risk / solid contribution, and the rest MEDIUM.

Dimension	Score (1-5)	Scoring Criteria
Longevity		Will this topic still be relevant in 3-5 years? Score 5 if it addresses a fundamental question. Score 1 if it rides a transient trend that may be obsolete in 1-2 years.
Passion alignment		Does this align with the researcher's stated interests, skills, and existing expertise? If the user has not stated preferences, default to score 3. If they have (e.g., "I work on systems" or "I'm interested in theory"), score accordingly.
Application potential		Can this strengthen a paper's motivation with real-world impact? Score 5 if it directly improves a deployed system or addresses a practitioner pain point. Score 1 if it is purely theoretical with no foreseeable application.
Uniqueness		Can the researcher make a unique contribution here that others cannot easily replicate? Score 5 if the idea leverages a unique dataset, insight, or methodological strength. Score 1 if any well-funded lab could do this faster.

# Generated Research Ideas (Raw)

**Direction**: [research direction from $ARGUMENTS]
**Date**: [today's date]
**Model**: gpt-5.4
**Landscape source**: [LANDSCAPE.md / abbreviated inline survey]
**Ideas generated**: [N]
**Codex thread ID**: [threadId for follow-up]

---

## IDEA-01: [title]
- **Thesis**: We show that X by Y
- **Gap addressed**: [specific gap from landscape, e.g., "G3: No existing work on Z"]
- **Core mechanism**: [the key technical insight]
- **Non-obvious because**: [skeptic's objection + rebuttal]
- **Contribution type**: [empirical finding / new method / theoretical result / diagnostic / new formulation]
- **Risk**: [LOW / MEDIUM / HIGH] — [1-sentence justification]
- **Effort**: [N] person-weeks
- **Closest work**: [paper title + authors/year] — delta: [what is specifically different]

---

## IDEA-02: [title]
[same structure]

---

[repeat for all ideas]

# Filtered Research Ideas

**Direction**: [research direction from $ARGUMENTS]
**Date**: [today's date]
**Pipeline**: Generated [X] ideas -> Feasibility filter -> Novelty quick-check -> Impact filter -> Prof. He 4-dimension filter -> [Y] surviving
**Landscape source**: [LANDSCAPE.md / abbreviated inline survey]
**Codex thread ID**: [threadId for follow-up]

---

## Surviving Ideas (ranked by Prof. He composite score, descending)

### Rank 1: [title] (IDEA-XX)
- **Thesis**: We show that X by Y
- **Gap addressed**: [specific gap]
- **Core mechanism**: [technical insight]
- **Non-obvious because**: [skeptic's objection + rebuttal]
- **Contribution type**: [type]
- **Risk**: [level] — [justification]
- **Effort**: [N] person-weeks
- **Closest work**: [paper] — delta: [difference]
- **He Score**: Longevity [X] + Passion [X] + Application [X] + Uniqueness [X] = [XX]/20
- **Anti-pattern flags**: [none / list of flags with explanations]
- **Quick novelty**: [LIKELY NOVEL / NEEDS DEEPER CHECK]
- **Why this ranks #1**: [1-2 sentences explaining why this is the top recommendation]

---

### Rank 2: [title] (IDEA-XX)
[same structure]

---

[repeat for all 4-6 surviving ideas]

---

## Eliminated Ideas

| # | Idea | Stage | Reason |
|---|------|-------|--------|
| IDEA-XX | [title] | Feasibility | [e.g., Requires unavailable dataset (ImageNet-22k with annotations)] |
| IDEA-XX | [title] | Novelty | [e.g., Already published: "Paper Title" (Author et al., 2025)] |
| IDEA-XX | [title] | Impact | [e.g., Neither positive nor negative result would change practice] |
| IDEA-XX | [title] | He Filter | [e.g., Score 10/20 — Longevity 2 (trend-dependent), Uniqueness 2 (easily replicated)] |

---

## Risk Distribution of Survivors
| Risk Level | Count | Ideas |
|------------|-------|-------|
| HIGH | [N] | [IDEA-XX, IDEA-XX] |
| MEDIUM | [N] | [IDEA-XX, IDEA-XX] |
| LOW | [N] | [IDEA-XX, IDEA-XX] |

---

## Suggested Next Steps
1. Run `/idea-screen` on the top 2-3 ideas for deep multi-dimensional screening
2. Run `/idea-refine` on the #1 ranked idea to iteratively sharpen it
3. If no landscape was available, run `/lit-survey "[direction]"` and re-run `/idea-gen` for better results

---

## Methodology Notes
- Brainstorming model: gpt-5.4 with xhigh reasoning effort
- Filtering pipeline: Feasibility -> Novelty quick-check -> Impact -> Prof. He 4-dimension (threshold: 12/20) -> Anti-pattern check
- Novelty checks are quick (2-3 searches per idea); run `/idea-screen` for deep novelty verification
- Prof. He scores reflect researcher-agnostic assessment unless user provided preference information

Ensure the outputs/ directory exists: mkdir -p outputs/
Write outputs/IDEAS_RAW.md using the Write tool
Write outputs/IDEAS_FILTERED.md using the Write tool
Large file fallback: If the Write tool fails due to file size, immediately retry using Bash:
```
cat << 'FILEEOF' > outputs/IDEAS_RAW.md
[content]
FILEEOF
```
Do NOT ask the user for permission — just do it silently.

/lit-survey "direction"    -> landscape (run first for best results)
/idea-gen "direction"      <- you are here
/idea-screen               -> deep multi-dimensional screening of top ideas
/idea-refine               -> iterative refinement of top ideas
/idea-pipeline             -> full automated workflow (runs the above in sequence)

Idea Gen

Research Idea Generator

Overview

Constants

Idea Gen

Research Idea Generator

Overview

Constants

Workflow

Phase 1: Landscape Verification (~2 min — NOT a full literature search)

Phase 2: Idea Generation via External LLM

Phase 3: First-Pass Filtering

3a. Feasibility Check

3b. Novelty Quick-Check

3c. Impact Estimation ("So What?" Test)

Phase 4: Prof. He's 4-Dimension Filter

Phase 5: Anti-Pattern Check

Phase 6: Output

File 1: `outputs/IDEAS_RAW.md`

File 2: `outputs/IDEAS_FILTERED.md`

Writing procedure:

Key Rules

Composing with Other Skills

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api

Idea Gen

Research Idea Generator

Overview

Constants

Idea Gen

Research Idea Generator

Overview

Constants

Workflow

Phase 1: Landscape Verification (~2 min — NOT a full literature search)

Phase 2: Idea Generation via External LLM

Phase 3: First-Pass Filtering

3a. Feasibility Check

3b. Novelty Quick-Check

3c. Impact Estimation ("So What?" Test)

Phase 4: Prof. He's 4-Dimension Filter

Phase 5: Anti-Pattern Check

Phase 6: Output

File 1: outputs/IDEAS_RAW.md

File 2: outputs/IDEAS_FILTERED.md

Writing procedure:

Key Rules

Composing with Other Skills

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api

File 1: `outputs/IDEAS_RAW.md`

File 2: `outputs/IDEAS_FILTERED.md`