8-step interactive wizard that produces a fully configured conference.md file. TRIGGER when: user wants to set up a new conference, plan a conference, create conference.md. DO NOT TRIGGER when: user wants to run an existing conference (use autoconference).
An interactive wizard that produces a fully configured conference.md file. This skill does NOT run a conference — it only produces the configuration file.
The plan wizard exists because launching a conference with a bad config wastes hours of compute. Each step probes the user's intent, validates feasibility, and encodes their decisions into a conference.md ready for /autoconference to execute.
Output: A single, fully populated conference.md file.
Does NOT: Run researchers, spawn agents, or begin any research.
Walk the user through all 8 steps in order. Do not skip steps. Do not assume answers — present options and wait for explicit confirmation.
After all 8 steps, write the file and print next steps.
Objective: Arrive at a specific, measurable goal.
Ask the user:
"What do you want to achieve with this conference? Describe it in one or two sentences."
After the user answers, probe for specificity:
Push back on vague goals. If the user says something like "I want to improve my model" or "I want better results", stop and ask:
"That's a direction, but not yet a goal. Can you be more specific? For example: 'Improve model accuracy on the CIFAR-10 test set from 85% to above 90%' or 'Generate a synthesis document comparing retrieval strategies for long-context summarization.'"
Do NOT proceed to Step 2 until the goal is specific enough that a researcher agent could know what to work on.
Record: goal — the final agreed goal statement.
Objective: Choose metric or qualitative mode.
Explain the two modes clearly:
Metric mode — there is a numeric score that directly measures success. Researchers are evaluated by whether the number goes up (or down). Examples: accuracy, latency, F1 score, BLEU, loss, LLM judge score 1-10.
Qualitative mode — success is about the quality of reasoning, writing, or synthesis. There is no single number — the Reviewer (Opus) judges whether outputs satisfy the stated criteria. Examples: literature review synthesis, hypothesis generation, design exploration, writing quality improvement.
Help the user decide with this decision rule:
"Can you measure success with a number that is reliably produced each iteration? → metric. Is success about the quality of reasoning, writing, or ideas — where 'good' requires a human (or LLM judge) to assess? → qualitative."
Warn about metric mode requirements:
"Metric mode requires an evaluator script that can be run automatically. If you don't have a script that produces a number, start with qualitative mode — you can always switch."
Record: mode — metric or qualitative.
Objective: Define how success is measured and verify the evaluator works.
Ask:
DRY-RUN GATE: Before proceeding, run the evaluator command on the baseline to verify it works.
[Running evaluator dry-run: {user's command}]
The dry-run gate is non-negotiable. A conference with an unverified evaluator is a conference that will fail.
Ask:
"Describe what 'good' looks like for this conference. The Reviewer (Opus) will use this as their rubric. Be specific — vague criteria produce vague judgments."
Prompt for specificity if needed:
Record:
metric_name, metric_target, metric_direction, baseline_value, evaluator_commandsuccess_criteria (multi-line description)Objective: Decide how many researchers and whether a Devil's Advocate is needed.
Ask: "How many researchers should participate in this conference?"
Present options:
If the user already specified count in a partial conference.md, confirm it: "You mentioned N researchers earlier. Proceed with this?"
Next, ask about the Devil's Advocate:
"Should one of the N researchers be a Devil's Advocate — deliberately pursuing contrarian strategies?"
Explain:
"A Devil's Advocate is assigned to challenge the mainstream approach. They try the opposite of what seems obvious, test assumptions others take for granted, and explore strategies the other researchers would dismiss. This catches blind spots and occasionally discovers breakthroughs. If you have 3 researchers and add a Devil's Advocate, one of the 3 fills that role (no additional agent)."
Record: count, devil_advocate (yes/no), and if yes, which researcher slot (typically the last one, e.g., Researcher C for count=3).
Objective: Decide how researchers divide the search space.
Ask: "How should researchers divide the search space?"
Present options:
If assigned:
For each researcher slot (A, B, C, ...), ask:
"What should Researcher {X} focus on? Describe their specific area of exploration."
Example prompts to help the user think:
If free: no per-researcher focus needed. All researchers receive the full search space description.
Also ask (for both modes):
Record: partitioning_strategy, per-researcher focus areas (if assigned), allowed_changes, forbidden_changes.
Objective: Configure the contrarian researcher's behavior (only if enabled in Step 4).
If Devil's Advocate was NOT enabled in Step 4, skip this step entirely.
If Devil's Advocate was enabled, configure their focus:
"The Devil's Advocate (Researcher {X}) will deliberately challenge the mainstream approach. Let's define their contrarian mandate."
Ask:
Explain the Devil's Advocate role one more time to confirm the user understands:
"The Devil's Advocate is not trying to win — they're trying to surface what everyone else is missing. Their best contribution is a finding that invalidates a shared assumption, even if their own metric score is low."
Record: devil_advocate_mandate — the contrarian researcher's specific instructions.
Objective: Configure how the conference will run.
Ask: "Do you want this conference to run overnight / unattended, or interactively with pauses for your review?"
Recommend:
"For overnight execution, use the autoconference loop script:
bash scripts/autoconference-loop.sh ./your-conference-dir/This script handles foreground, nohup, and tmux execution modes. I'll set
pause_every: neverin your conference.md so the Conference Chair advances automatically.To check progress while it runs:
bash scripts/check_conference.sh ./your-conference-dir/"
Ask:
Ask:
Ask the same budget questions:
Record: pause_every (never / every_round / every_N_rounds / pivot_only), time_budget, max_rounds, max_total_iterations, researcher_timeout.
Objective: Write the fully populated conference.md file.
First, ask: "Where should I write the conference.md file?"
./conference/conference.mdThen scaffold the file using ../../assets/conference_template.md as the structural base. Fill in every field from Steps 1–7. Leave NO placeholders — every {...} in the template must be replaced with a real value.
| Template field | Source |
|---|---|
{Title} | Derive from goal (e.g., "Optimize CIFAR-10 Accuracy") |
Goal | Step 1: goal statement |
Mode | Step 2: metric / qualitative |
Success Metric | Step 3 (metric mode only) |
Success Criteria | Step 3 (qualitative mode only) |
Count | Step 4 |
Iterations per round | Step 7: default 5, or ask if not set |
Max rounds | Step 7 |
Allowed changes | Step 5 |
Forbidden changes | Step 5 |
Search Space Partitioning → Strategy | Step 5 |
Researcher A/B/C Focus | Step 5 (if assigned) |
Max total iterations | Step 7 |
Time budget | Step 7 |
Researcher timeout | Step 7 |
pause_every | Step 7 |
Current Approach | Step 1: baseline description |
Shared Knowledge | Leave blank (auto-populated at runtime) |
If Devil's Advocate is enabled, add a comment in the Researcher A/B/C Focus section indicating which researcher is the Devil's Advocate and their mandate from Step 6.
Write the file. Then confirm to the user:
conference.md written to: {path}
Next steps:
1. Review the file: cat {path}
2. Run the conference: /autoconference (with {path} open or referenced)
OR for overnight: bash scripts/autoconference-loop.sh {directory}/
3. Monitor progress: bash scripts/check_conference.sh {directory}/
OR tail -f {directory}/conference_events.jsonl
Skill chain: plan → autoconference → ship
conference.md before all 8 steps are complete.conference.md.