Use this skill when the user wants the agent to act as an automated judge to verify if a specific answer, conversation turn, or report is correct by comparing it against a provided 'gold' reference or multi-dimensional rubric. Trigger it for requests like 'check my work', 'is this answer right according to the solution?', 'grade this response', 'evaluate the professionalism of this reply', or 'verify the math in this report'. Everyday examples include: 'Does my tax math match this rule book?', 'Is this student's response appropriate and ethical?', 'Can you audit this financial statement for me?', and 'Evaluate if this chat got distracted based on the provided guidelines.'
[Case 1]
[Case 2]
[Case 3]
To synthesize data for this capability, you must strictly follow a 3-phase pipeline. Do not hallucinate steps. Read the corresponding reference file for each phase sequentially:
Phase 1: Environment Exploration
Read the exploration guidelines to discover raw knowledge seeds:
references/EXPLORATION.md
Phase 2: Trajectory Selection
Once Phase 1 is complete, read the selection criteria to evaluate the trajectory:
references/SELECTION.md
Phase 3: Data Synthesis
Once a trajectory passes Phase 2, read the synthesis instructions to generate the final data:
references/SYNTHESIS.md