Use this skill when the user wants SQL training data where the hard part is first finding the right database or table collection before writing the query. Trigger it for requests like “make examples where the agent has to figure out which dataset matters”, “the data should not already be handed to the model”, “it should need to look through many tables first”, or “make open-domain SQL questions.” Example trigger: “The question should not tell the model which database to use.” Example trigger: “Make examples where the agent has to search several candidate tables first.” Example trigger: “I want SQL tasks that feel like finding the right data source is half the job.”
[Case 1]
[Case 2]
To synthesize data for this capability, you must strictly follow a 3-phase pipeline. Do not hallucinate steps. Read the corresponding reference file for each phase sequentially:
Phase 1: Environment Exploration
Read the exploration guidelines to discover raw knowledge seeds:
references/EXPLORATION.md
Phase 2: Trajectory Selection
Once Phase 1 is complete, read the selection criteria to evaluate the trajectory:
references/SELECTION.md
Phase 3: Data Synthesis
Once a trajectory passes Phase 2, read the synthesis instructions to generate the final data:
references/SYNTHESIS.md