Analyze measurement validity for experimental design — auditing metric-construct alignment, proxy validity, reliability, sensitivity, and consequential validity. Argumentative lens answering "Do measurements justify the interpretation?"
Philosophical Mode: Psychometric Primary Question: "Do measurements justify the interpretation?" Focus: Metric-Construct Alignment, Proxy Validity, Reliability, Sensitivity, Consequential Validity
/exp-lens-measurement-validity or /make-experiment-diag measurementNEVER:
ALWAYS:
/mermaid skill using the Skill tool - this is MANDATORYSpawn Explore subagents to investigate:
Metric Definitions
Intended Interpretations
Metric Computation Details
Alternative Metrics Considered
Construct-Metric Gap
For each reported metric, construct a validity argument:
CRITICAL — for every metric-to-claim link:
This lens does NOT produce a primary mermaid diagram. The output is a structured validity argument. An optional simplified metric mapping diagram may be included if it clarifies the metric-construct relationship.
If including the optional diagram:
Direction: LR (constructs on left, metrics on right)
Minimal diagram: Construct nodes on the left, Metric nodes on the right, with edge labels indicating strength of alignment
Node Styling:
cli class: Constructs and intended properties being claimedoutput class: Measured metrics (what is actually computed)gap class: Weak or missing alignments, proxy collapseshandler class: Proxy relationships and intermediate mappingsConnection Types:
Write the output to: temp/exp-lens-measurement-validity/exp_diag_measurement_validity_{YYYY-MM-DD_HHMMSS}.md
# Measurement Validity Analysis: {Experiment Name}
**Lens:** Measurement Validity (Psychometric)
**Question:** Do measurements justify the interpretation?
**Date:** {YYYY-MM-DD}
**Scope:** {What was analyzed}
## Metric Inventory
| Metric | Construct Claimed | Computation | Reliability | Sensitivity |
|--------|-------------------|-------------|-------------|-------------|
| {metric name} | {what it claims to measure} | {aggregation/formula} | {stable / unstable / unknown} | {high / low / saturated} |
## Validity Arguments
### {Metric Name}
**Construct claimed:** {The property this metric is presented as measuring}
**Evidence for alignment:**
- {Supporting argument or citation}
**Evidence against alignment / known failure modes:**
- {Failure mode 1: e.g., gameable by surface pattern matching}
- {Failure mode 2: e.g., proxy collapses when distribution shifts}
**Reliability assessment:** {Stable under reruns? Sensitive to seed?}
**Sensitivity assessment:** {Can it distinguish meaningful differences in the relevant range?}
**Verdict:** {Strong / Partial / Weak / Unsupported}
---
## Proxy Collapse Risks
| Metric | Proxy For | Collapse Condition | Consequence |
|--------|-----------|--------------------|-------------|
| {metric} | {true construct} | {when proxy diverges from construct} | {what is falsely concluded} |
## Gaming Vulnerabilities
| Metric | Gaming Strategy | Detection Method |
|--------|----------------|-----------------|
| {metric} | {how to maximize score without improving construct} | {how to detect gaming} |
## Optional: Metric-Construct Mapping Diagram
{Include only if it clarifies alignment; omit if argument tables are sufficient}
```mermaid
%%{init: {'flowchart': {'nodeSpacing': 50, 'rankSpacing': 60, 'curve': 'basis'}}}%%
flowchart LR
%% CLASS DEFINITIONS %%
classDef cli fill:#1a237e,stroke:#7986cb,stroke-width:2px,color:#fff;
classDef stateNode fill:#004d40,stroke:#4db6ac,stroke-width:2px,color:#fff;
classDef handler fill:#e65100,stroke:#ffb74d,stroke-width:2px,color:#fff;
classDef phase fill:#6a1b9a,stroke:#ba68c8,stroke-width:2px,color:#fff;
classDef newComponent fill:#2e7d32,stroke:#81c784,stroke-width:2px,color:#fff;
classDef output fill:#00695c,stroke:#4db6ac,stroke-width:2px,color:#fff;
classDef detector fill:#b71c1c,stroke:#ef5350,stroke-width:2px,color:#fff;
classDef gap fill:#ff6f00,stroke:#ffa726,stroke-width:2px,color:#000;
classDef integration fill:#c62828,stroke:#ef9a9a,stroke-width:2px,color:#fff;
subgraph Constructs ["Intended Constructs"]
C1["Construct A<br/>━━━━━━━━━━<br/>The property claimed"]
C2["Construct B<br/>━━━━━━━━━━<br/>Another property"]
end
subgraph Metrics ["Measured Metrics"]
M1["Metric X<br/>━━━━━━━━━━<br/>Computation method"]
M2["Metric Y<br/>━━━━━━━━━━<br/>Computation method"]
end
subgraph Gaps ["Weak / Missing Alignments"]
G1["Proxy Collapse Risk<br/>━━━━━━━━━━<br/>Condition for divergence"]
end
%% ALIGNMENTS %%
C1 -->|"strong alignment"| M1
C2 -->|"proxy (weak)"| M2
M2 -.->|"diverges under"| G1
%% CLASS ASSIGNMENTS %%
class C1,C2 cli;
class M1,M2 output;
class G1 gap;
Color Legend:
| Color | Category | Description |
|---|---|---|
| Dark Blue | Construct | Intended properties being claimed |
| Dark Teal | Metric | What is actually computed and reported |
| Yellow | Gap | Weak alignment, proxy collapse, or missing evidence |
| Orange | Proxy | Intermediate proxy relationships |
| Metric | Verdict | Primary Concern |
|---|---|---|
| {metric} | {Strong / Partial / Weak / Unsupported} | {One-line summary of the key validity concern} |
---
## Pre-Diagram Checklist
Before creating any optional diagram, verify:
- [ ] LOADED `/mermaid` skill using the Skill tool
- [ ] Using ONLY classDef styles from the mermaid skill (no invented colors)
- [ ] Diagram will include a color legend table
---
## Related Skills
- `/make-experiment-diag` - Parent skill for lens selection
- `/mermaid` - MUST BE LOADED before creating any optional diagram
- `/exp-lens-estimand-clarity` - For auditing the upstream claim the metric is meant to support
- `/exp-lens-benchmark-representativeness` - For auditing whether the evaluation set generalizes