Name: Skill: depth-and-dimensional-measurement
Author: Dingxingdi

1. Capability Definition & Real Case

Professional Definition: This capability measures whether an embodied agent can infer metric or near-metric physical properties such as depth, height, width, length, and relative distance from visual observations. It requires transforming image evidence into physically meaningful estimates instead of relying on coarse semantic labels alone.
Dimension Hierarchy: Perceptual World Modeling->Static Spatial Grounding->depth-and-dimensional-measurement

[Case 1]

Initial Environment: The embodied agent receives a multi-view observation of an office where a desk appears from two slightly different viewpoints. A monitor, keyboard, notebook, and desk edge are visible, and the full desktop surface can be jointly inferred from the views.
Real Question: What is the height of the desk?
Real Answer: Approximately 950 millimeters.
Why this demonstrates the capability: The answer depends on recovering physical scale from image structure rather than naming the desk. The agent must use scene geometry and object proportions to infer an actual dimension. That makes the task a measurement problem, not a recognition problem.

Skill: depth-and-dimensional-measurement