Name: Skill: faithful-fine-grained-video-description
Author: Dingxingdi

1. Capability Definition & Real Case

Professional Definition: The capability to generate an open-ended video description that is both comprehensive and faithful: it should cover the important dynamic events, subjects, and transitions while avoiding hallucinated actions or unsupported detail.
Dimension Hierarchy: Multimodal and Generative Interpretation->Open-Ended Video Generation->faithful-fine-grained-video-description

[Case 1]

Initial Environment: A 14-second indoor clip recorded from above. A woman sits near a low floor table beside a bottle and a pair of headphones, repeatedly reaching, lifting, and placing objects while music plays in the room.
Real Question: Describe the video in detail.
Real Trajectory: Inspect the clip as an ordered sequence of atomic events, list the major actions without duplicating near-identical submotions, verify subject-object relations across the whole clip, and assemble a paragraph that covers all notable actions without adding unsupported details.

Skill: faithful-fine-grained-video-description