Overview

Transform the winning proposal into a concrete experiment batch specification suitable for the backend contract. This is the bridge between a high-level idea and an actionable execution plan — it takes the selected proposal and produces a design detailed enough for the materialization worker to implement without ambiguity.

This skill is a leaf worker operating in the design auxiliary lane. It receives the winning proposal and campaign context from the orchestrator via a subagent prompt, produces a concrete experiment specification, and returns it. It does not generate code, interact with files, dispatch subagents, or manage state.

Lane: Auxiliary slot — design Model class: strong_reasoner (prefer a strong reasoning model like Opus 4.6 fast, fallback to a capable general model like GPT-5.4)

Input Contract

The orchestrator supplies all inputs as structured context in the subagent prompt. This skill never reads files directly from the campaign repo.

Standard Envelope (provided by orchestrator on every dispatch)

Overview

Lane: Auxiliary slot — design Model class: strong_reasoner (prefer a strong reasoning model like Opus 4.6 fast, fallback to a capable general model like GPT-5.4)

Input Contract

The orchestrator supplies all inputs as structured context in the subagent prompt. This skill never reads files directly from the campaign repo.

Field	Type	Description
`campaign_id`	string	Campaign identifier
`current_iteration`	integer	Current iteration number
`slot_id`	string	The slot ID dispatching this worker
`attempt`	integer	Attempt number for this dispatch (1-indexed)

Field	Type	Description
`winning_proposal`	object	The complete proposal selected by the synthesis phase
`winning_proposal.proposal_id`	string	Non-empty identifier for the proposal
`winning_proposal.title`	string	Concise name for the experiment
`winning_proposal.rationale`	string	Why this change is expected to improve the metric
`winning_proposal.target_area`	string	Which pipeline aspect this targets
`winning_proposal.expected_impact`	object	`{ direction, magnitude }` — the ideation worker's impact assessment

Field	Type	Description
`goal`	string	The campaign's top-level improvement goal
`metric`	string	The objective metric being optimized
`direction`	`"minimize"` or `"maximize"`	Whether lower or higher metric values are better
`aggregation_method`	string	How per-dataset scores roll up into the aggregate (e.g. `mean`, `weighted_mean`)
`aggregation_weights`	object or null	Per-dataset weights when method is `weighted_mean`; `null` otherwise

Field	Type	Description
`datasets`	array	Dataset entries from the campaign spec
`datasets[].id`	string	Dataset identifier
`datasets[].role`	string	Dataset role (e.g. `train`, `eval`, `test`)
`datasets[].fingerprint`	string	Content-addressed fingerprint for the dataset

Field	Type	Description
`execution`	object	Execution configuration from the campaign spec
`execution.runner_type`	string	How trials are executed (e.g. `ray_tune`, `subprocess`)
`execution.entrypoint`	string	Shell command used to run the experiment
`execution.trial_budget`	object	`{ kind: string, value: number }` — budget type and amount
`execution.search_strategy`	object	`{ kind: string, ...params }` — search strategy and parameters

Metaopt Experiment Design

Overview

Input Contract

Standard Envelope (provided by orchestrator on every dispatch)

Metaopt Experiment Design

Overview

Input Contract

Standard Envelope (provided by orchestrator on every dispatch)

Winning Proposal

Campaign Context

Dataset Definitions

Execution Config

Baselines

Historical Context

Backend Contract Requirements

Output Contract

Experiment Specification (required)

Code Change Guidance (required)

Hyperparameter Search Space (required when applicable)

Dataset Usage Plan (required)

Artifact Expectations (required)

Success Criteria (required)

Execution Assumptions (required)

Risks and Caveats (required)

Behavioral Rules

Common Mistakes

References

Clickhouse Io

Clickhouse Io

Claude Devfleet

Clickhouse Io

Ai First Engineering

Postgres Patterns

Field	Type	Description
`aggregate_baseline`	number	Current aggregate baseline score
`per_dataset_baselines`	object	Map of dataset IDs to their current numeric baseline values

Field	Type	Description
`key_learnings`	array	Learnings extracted from prior iterations
`completed_experiments`	array	Summary of all previously run experiments and their outcomes

Field	Type	Description
`experiment_name`	string	A descriptive name for the experiment batch (≤ 15 words)
`experiment_description`	string	One-paragraph summary of what the experiment does and why
`proposal_id`	string	Echoes the winning proposal's `proposal_id`

Field	Type	Description
`code_changes`	array	File-level guidance for the materialization worker
`code_changes[].file_path`	string	Relative path to the file that needs modification
`code_changes[].change_type`	string	One of `modify`, `create`, `delete`
`code_changes[].description`	string	What the change should accomplish
`code_changes[].key_constraints`	array of strings	Important constraints the implementation must respect

Field	Type	Description
`search_space`	object or `null`	Hyperparameter search space definition; `null` if no hyperparameter search is involved
`search_space.parameters`	array	Parameters to search over
`search_space.parameters[].name`	string	Parameter name
`search_space.parameters[].type`	string	One of `continuous`, `discrete`, `categorical`
`search_space.parameters[].range`	array or object	Value range or set of allowed values
`search_space.parameters[].default`	any	Default or baseline value if known
`search_space.strategy`	string	Must match `execution.search_strategy.kind`
`search_space.trial_budget`	integer	Must not exceed `execution.trial_budget.value`

Field	Type	Description
`dataset_plan`	array	How each dataset is used in the experiment
`dataset_plan[].dataset_id`	string	Matches an entry from the campaign's `datasets`
`dataset_plan[].role`	string	Role in this experiment (e.g. `train`, `eval`, `holdout`)
`dataset_plan[].notes`	string	Any special handling or preprocessing required

Field	Type	Description
`artifact_expectations`	object	What the materialization phase must produce
`artifact_expectations.code_artifact`	string	Description of the immutable code artifact expected
`artifact_expectations.data_manifest`	string	Description of the data manifest expected
`artifact_expectations.additional_artifacts`	array of strings	Any other artifacts the experiment requires

Field	Type	Description
`success_criteria`	object	How to judge whether the experiment succeeded
`success_criteria.metric`	string	Must match the campaign's `metric`
`success_criteria.direction`	string	Must match the campaign's `direction`
`success_criteria.minimum_improvement`	number or `null`	Minimum delta over aggregate baseline to count as success; `null` if any directional improvement suffices
`success_criteria.per_dataset_expectations`	object or `null`	Optional expected per-dataset outcomes

Field	Type	Description
`execution_assumptions`	object	Resource and runtime expectations
`execution_assumptions.runner_type`	string	Must match the campaign's `execution.runner_type`
`execution_assumptions.estimated_duration`	string	Human-readable estimate (e.g. `"2-4 hours"`)
`execution_assumptions.resource_requirements`	string	Expected compute resources (e.g. `"1x GPU per trial"`)
`execution_assumptions.parallelism`	string	Expected trial parallelism (e.g. `"up to 4 concurrent trials"`)

Field	Type	Description
`risks`	array	Known risks, assumptions, or caveats the orchestrator should track
`risks[].description`	string	What the risk is
`risks[].mitigation`	string	How to mitigate or what to watch for

Mistake	Fix
Writing actual code instead of a specification	Describe what changes are needed at the file level, not how to write the code
Exceeding the trial budget with the search space	Check `execution.trial_budget.value` and constrain `search_space.trial_budget` accordingly
Using a search strategy different from the campaign config	Match `search_space.strategy` to `execution.search_strategy.kind` exactly
Leaving code change guidance too vague to implement	Specify file paths, change types, and what each change must accomplish
Ignoring per-dataset baselines when defining success	Reference both aggregate and per-dataset baselines in `success_criteria`
Omitting the dataset usage plan	Every dataset in the campaign must be accounted for in `dataset_plan`
Designing artifacts incompatible with the backend contract	Ensure the design expects immutable code artifacts and data manifests, not mutable working tree paths
Repeating a search space region already explored	Check `completed_experiments` and `key_learnings` to avoid redundant exploration
Providing no risk assessment	Always include at least one risk or caveat, even for straightforward experiments
Forgetting to echo the proposal ID	Include `proposal_id` in the experiment specification for lineage tracking
Defining success without referencing the baseline	State minimum improvement relative to `aggregate_baseline`, not in absolute terms
Designing for a runner type different from the campaign config	Match `execution_assumptions.runner_type` to `execution.runner_type`