This skill helps an LLM generate correct AxAgent tuning and evaluation code using @ax-llm/ax. Use when the user asks about agent.optimize(...), judgeOptions, eval datasets, optimization targets, saved optimizedProgram artifacts, or recursive optimization guidance.
Use this skill for agent.optimize(...) workflows. Prefer short, modern, copyable patterns. Do not repeat general agent-authoring guidance unless the user needs it.
Your job is to help the model choose a good optimization setup for the user's actual goal:
agent.optimize(...) only after the agent is already configured and runnable.metric when success is easy to score from the prediction and task record.judgeAIjudgeOptionsAxGen evaluator when the user needs LLM-as-judge behavior outside the built-in agent.optimize(...) flow.root.actor; use target: 'responder' or explicit program IDs only when the user clearly asks for that.f.object(...) over vague f.json(...) whenever the agent must reason about returned fields.result.optimizedProgram, then restore with new AxOptimizedProgramImpl(...) and agent.applyOptimization(...).mode: 'advanced' on the agent and tune against realistic recursionOptions.Pick the optimization shape from the user's need:
root.actor with expectedActions and forbiddenActions.target: 'responder', but only if the task is not mostly tool-selection or clarification behavior.mode: 'advanced' and use tasks that actually exercise recursion depth, fan-out, and termination choices.Choose task design carefully:
Optimization works much better when the agent and dataset remove avoidable ambiguity:
llmQuery(..., context) payloads, say that directly in the actor prompt.maxSubAgentCalls small in examples unless the user is explicitly testing broad fan-out behavior.javascript: prefixes, mixed prose/code, and multi-snippet turns.Good pattern:
Bad pattern:
json with an underspecified shapeAtlas without clarifying whether that is a project, team, or accountChoose the scoring path based on how objectively the task can be measured:
metric when you can score success directly from prediction and example.judgeOptions.description to tell the built-in judge what to value most.agent.optimize(...) and still wants LLM judging.Quick rules:
AxGen evaluator.Important:
metric overrides the built-in judge path entirely.AxGen.metric and judge guidance unless the user explicitly wants two separate scoring systems and understands only the custom metric drives optimization.AxGen judge metric, prefer a numeric score:number output over a string tier when possible. It is simpler and less fragile in practice.import {
AxAIGoogleGeminiModel,
AxJSRuntime,
AxOptimizedProgramImpl,
axDefaultOptimizerLogger,
agent,
ai,
f,
fn,
} from '@ax-llm/ax';
const tools = [
fn('sendEmail')
.namespace('email')
.description('Send an email message')
.arg('to', f.string('Recipient email address'))
.arg('body', f.string('Email body text'))
.returns(
f.object({
sent: f.boolean('Whether the email was sent'),
to: f.string('Recipient email address'),
})
)
.handler(async ({ to }) => ({ sent: true, to }))
.build(),
];
const studentAI = ai({
name: 'google-gemini',
apiKey: process.env.GOOGLE_APIKEY!,
config: { model: AxAIGoogleGeminiModel.Gemini25FlashLite, temperature: 0.2 },
});
const judgeAI = ai({
name: 'google-gemini',
apiKey: process.env.GOOGLE_APIKEY!,
config: { model: AxAIGoogleGeminiModel.Gemini3Pro, temperature: 1.0 },
});
const assistant = agent('query:string -> answer:string', {
ai: studentAI,
judgeAI,
contextFields: [],
runtime: new AxJSRuntime(),
functions: { local: tools },
contextPolicy: { preset: 'checkpointed', budget: 'balanced' },
judgeOptions: {
description: 'Prefer correct tool use over polished wording.',
model: 'judge-model',
},
});
const tasks = [
{
input: { query: 'Send an email to Jim saying good morning.' },
criteria: 'Use the email tool and send the message to Jim.',
expectedActions: ['email.sendEmail'],
},
];
const result = await assistant.optimize(tasks, {
target: 'actor',
maxMetricCalls: 12,
verbose: true,
optimizerLogger: axDefaultOptimizerLogger,
onProgress: (progress) => {
console.log(
`round ${progress.round}/${progress.totalRounds} current=${progress.currentScore} best=${progress.bestScore}`
);
},
});
const saved = JSON.stringify(result.optimizedProgram, null, 2);
const restored = new AxOptimizedProgramImpl(JSON.parse(saved));
assistant.applyOptimization(restored);
Use this when the task has crisp correctness and cost/behavior tradeoffs:
const result = await assistant.optimize(tasks, {
target: 'actor',
metric: ({ prediction, example }) => {
if (prediction.completionType !== 'final' || !prediction.output) {
return 0;
}
let score = 0;
if (prediction.output.answer.includes('Jim')) score += 0.4;
if (
prediction.functionCalls.some(
(call) => call.qualifiedName === 'email.sendEmail'
)
) {
score += 0.4;
}
if ((prediction.recursiveStats?.recursiveCallCount ?? 0) === 0) {
score += 0.2;
}
return score;
},
});
Use this pattern when:
Use this when the agent behavior needs holistic review:
const result = await assistant.optimize(tasks, {
judgeAI,
judgeOptions: {
model: AxAIGoogleGeminiModel.Gemini3Pro,
description:
'Be strict about unnecessary delegation, weak clarifications, and incorrect tool choices.',
},
maxMetricCalls: 12,
});
Use this pattern when:
AxGen Judge PatternUse this only when the user needs LLM judging outside the built-in agent.optimize(...) path:
import { AxGen, s } from '@ax-llm/ax';
const judgeGen = new AxGen(
s(`
taskInput:json "Task input",
candidateOutput:json "Candidate output",
expectedOutput?:json "Optional reference output"
->
score:number "Normalized score from 0 to 1"
`)
);
judgeGen.setInstruction(
'Score the candidate output from 0 to 1. Reward correctness and task completion. Return only the score field.'
);
const metric = async ({ prediction, example }) => {
const result = await judgeGen.forward(judgeAI, {
taskInput: example,
candidateOutput: prediction,
expectedOutput: example.expectedOutput,
});
return Math.max(0, Math.min(1, result.score));
};
const result = await optimizer.compile(program, train, metric, {
validationExamples: validation,
});
Use this pattern when:
AxGen, flow, or another program directlyagent.optimize(...) wrapperexpectedActions and forbiddenActions when tool correctness matters.judgeOptions mirrors normal forward options and supports extra judge guidance through description.recursiveTrace and recursiveStats.metric, that overrides the built-in judge path.Decision rules:
AxGen evaluator when the user is not calling agent.optimize(...) but still wants LLM judging.judgeOptions.description to steer the judge toward the user's real priority, such as tool correctness, brevity, groundedness, or policy compliance.agent.optimize(...) runs each evaluation rollout from a clean continuation state.getState() and setState(...) is not used during eval rollouts.askClarification(...) is treated as a scored evaluation outcome instead of going through the responder.prediction.completionType === 'askClarification', populated prediction.clarification, and absent prediction.output.prediction.completionType === 'final' and populated prediction.output.target: 'responder' still works, but clarification-heavy tasks are usually low-signal for responder optimization.mode: 'advanced' top-level; child recursion behavior still follows recursionOptions.maxDepth and tool/discovery structure you expect in production.result.optimizedProgram if the user wants portable artifacts.new AxOptimizedProgramImpl(...), then call agent.applyOptimization(...).AxGen.json tool returns when the agent must reason about specific fields across recursive steps.