Use this skill when the user wants 'describe the video in detail', 'cover all important actions', 'write a rich but accurate description', or 'make captioning depend on many events instead of one obvious action.' Trigger it for open-ended generative data where the challenge is not choosing an option but producing a faithful, event-complete description.
[Case 1]
To synthesize data for this capability, you must strictly follow a 3-phase pipeline. Do not hallucinate steps. Read the corresponding reference file for each phase sequentially:
Phase 1: Environment Exploration
Read the exploration guidelines to discover raw knowledge seeds:
references/EXPLORATION.md
Phase 2: Trajectory Selection
Once Phase 1 is complete, read the selection criteria to evaluate the trajectory:
references/SELECTION.md
Phase 3: Data Synthesis
Once a trajectory passes Phase 2, read the synthesis instructions to generate the final data:
references/SYNTHESIS.md