Turn a refined research proposal or method idea into a detailed, claim-driven experiment roadmap. Use after `aris-research-refine`, or when the user asks for a detailed experiment plan, ablation matrix, evaluation protocol, run order, compute budget, or paper-ready validation that supports the core problem, novelty, simplicity, and any LLM / VLM / Diffusion / RL-based contribution.
Refine and concretize: $ARGUMENTS
Use this skill after the method is stable enough that the next question becomes: what exact experiments should we run, in what order, to defend the paper? If the user wants the full chain in one request, prefer /aris-research-refine-pipeline.
The goal is not to generate a giant benchmark wishlist. The goal is to turn a proposal into a claim -> evidence -> run order roadmap that supports four things:
refine-logs/ — Default destination for experiment planning artifacts.Read the most relevant existing files first if they exist:
refine-logs/FINAL_PROPOSAL.mdrefine-logs/REVIEW_SUMMARY.mdrefine-logs/REFINEMENT_REPORT.mdExtract:
If these files do not exist, derive the same information from the user's prompt.
Before proposing experiments, write down the claims that must be defended.
Use this structure:
Do not exceed MAX_PRIMARY_CLAIMS unless the paper truly has multiple inseparable claims.
Design the paper around a compact set of experiment blocks. Default to the following blocks and delete any that are not needed:
For each block, decide whether it belongs in:
Prefer one strong baseline family over many weak baselines. If a stronger modern baseline exists, use it instead of padding the list.
For every kept block, fully specify:
Special rules:
Build a realistic run order so the user knows what to do first.
Use this milestone structure:
For each milestone, estimate:
Separate must-run from nice-to-have experiments.
refine-logs/EXPERIMENT_PLAN.mdUse this structure:
# Experiment Plan
**Problem**: [problem]
**Method Thesis**: [one-sentence thesis]
**Date**: [today]
## Claim Map
| Claim | Why It Matters | Minimum Convincing Evidence | Linked Blocks |
|-------|-----------------|-----------------------------|---------------|
| C1 | ... | ... | B1, B2 |
## Paper Storyline
- Main paper must prove:
- Appendix can support:
- Experiments intentionally cut:
## Experiment Blocks
### Block 1: [Name]
- Claim tested:
- Why this block exists:
- Dataset / split / task:
- Compared systems:
- Metrics:
- Setup details:
- Success criterion:
- Failure interpretation:
- Table / figure target:
- Priority: MUST-RUN / NICE-TO-HAVE
### Block 2: [Name]
...
## Run Order and Milestones
| Milestone | Goal | Runs | Decision Gate | Cost | Risk |
|-----------|------|------|---------------|------|------|
| M0 | ... | ... | ... | ... | ... |
## Compute and Data Budget
- Total estimated GPU-hours:
- Data preparation needs:
- Human evaluation needs:
- Biggest bottleneck:
## Risks and Mitigations
- [Risk]:
- [Mitigation]:
## Final Checklist
- [ ] Main paper tables are covered
- [ ] Novelty is isolated
- [ ] Simplicity is defended
- [ ] Frontier contribution is justified or explicitly not claimed
- [ ] Nice-to-have runs are separated from must-run runs
refine-logs/EXPERIMENT_TRACKER.mdUse this structure:
# Experiment Tracker
| Run ID | Milestone | Purpose | System / Variant | Split | Metrics | Priority | Status | Notes |
|--------|-----------|---------|------------------|-------|---------|----------|--------|-------|
| R001 | M0 | sanity | ... | ... | ... | MUST | TODO | ... |
Keep the tracker compact and execution-oriented.
Experiment plan ready.
Must-run blocks:
- [Block 1]
- [Block 2]
Highest-risk assumption:
- [risk]
First three runs to launch:
1. [run]
2. [run]
3. [run]
Plan file: refine-logs/EXPERIMENT_PLAN.md
Tracker file: refine-logs/EXPERIMENT_TRACKER.md
Large file handling: If the Write tool fails due to file size, immediately retry using Bash (cat << 'EOF' > file) to write in chunks. Do NOT ask the user for permission — just do it silently.
Every experiment must defend a claim. If it does not change a reviewer belief, cut it.
Prefer a compact paper story. Design the main table first, then add only the ablations that defend it.
Defend simplicity explicitly. If complexity is a concern, include a deletion study or a stronger-but-bloated variant comparison.
Defend frontier choices explicitly. If a modern primitive is central, prove why it is better than the strongest simpler alternative.
Prefer strong baselines over long baseline lists. A short, credible comparison set is better than a padded one.
Separate must-run from nice-to-have. Do not let appendix ideas delay the core paper evidence.
Reuse proposal constraints. Do not invent unrealistic budgets or data assumptions.
Do not fabricate results. Plan evidence; do not claim evidence.
/aris-research-refine-pipeline -> one-shot method + experiment planning
/aris-research-refine -> method and claim refinement
/aris-experiment-plan -> detailed experiment roadmap
/aris-run-experiment -> execute the runs
/aris-auto-review-loop -> react to results and iterate on the paper