Use this skill when the user wants tasks requiring precise physical interaction across coordinates, such as drag-and-drop, selecting specific text spans, adjusting sliders, or drawing/arranging elements based on spatial constraints. Trigger it for requests like “highlight the second sentence,” “select the paragraph starting with 'For',” “move the file to the folder,” “set the slider to 75%,” “draw a circle inside the square,” or “draw a 3x3 grid.” This skill is for GUI tasks where the agent must resolve spatial start/end points, compute geometric bounding boxes, or maintain continuous motion control to manipulate UI elements precisely.
[Case 1]
[Case 2]
[Case 3]
To synthesize data for this capability, you must strictly follow a 3-phase pipeline. Do not hallucinate steps. Read the corresponding reference file for each phase sequentially:
Phase 1: Environment Exploration
Read the exploration guidelines to discover raw knowledge seeds:
references/EXPLORATION.md
Phase 2: Trajectory Selection
Once Phase 1 is complete, read the selection criteria to evaluate the trajectory:
references/SELECTION.md
Phase 3: Data Synthesis
Once a trajectory passes Phase 2, read the synthesis instructions to generate the final data:
references/SYNTHESIS.md