Guided 5-step workflow for designing psychometrically sound assessment items. Use when the user wants to create new items, write assessment questions, design forced-choice blocks, build Likert scales, or create situational judgment scenarios. Triggers: write items, create items, design items, new assessment items, MFC blocks, Likert scale, SJT scenarios, item writing, assessment design.
Guided workflow for creating psychometrically sound assessment items. Walks through 5 steps from construct definition to quality review.
Before starting, read these domain files for reference:
skills/psychometric-advisor/item-design.md — item format principles and anti-patternsskills/psychometric-advisor/io-psychology.md — competency modeling context (if workplace assessment)Ask the user to define what they're measuring:
If the user is vague, push back: "I need specific behavioral indicators to write good items. 'Leadership' is too broad — which aspect of leadership?"
Use this decision tree based on Step 1 answers:
Is faking a concern? (high-stakes selection, promotion)
├─ Yes → Is the construct personality/behavioral?
│ ├─ Yes → MFC (forced-choice blocks)
│ │ └─ How many dimensions? ≥3 → TIRT-compatible MFC (pairs or triplets)
│ └─ No (knowledge/ability) → MC with distractor analysis
├─ No → Is the response continuous (degree of agreement)?
│ ├─ Yes → Likert scale (5 or 7 points)
│ └─ No → Is it about judgment in context?
│ ├─ Yes → SJT (situational judgment test)
│ └─ No → Consider AI-scored open-ended response
Present the recommendation with rationale. If the user disagrees, note the tradeoffs but follow their choice.
Before accepting the format choice, surface the key tradeoff for the user's context:
Write items following format-specific principles:
MFC (Forced-Choice) Items:
Likert Items:
SJT Items:
AI-Scored Open-Ended:
Run each drafted item through this checklist:
| # | Check | Pass Criteria |
|---|---|---|
| 1 | Social desirability balance | Within MFC blocks: options equally attractive. Across Likert: mix of positive/negative keying |
| 2 | Reading level | Appropriate for target population (Flesch-Kincaid or similar) |
| 3 | No double-barreled items | Each item measures exactly one thing |
| 4 | No leading/loaded language | Neutral framing, no value judgments embedded |
| 5 | Behavioral anchors distinct | Each anchor level represents observably different behavior |
| 6 | Dimensional coverage | Item set covers all specified sub-competencies |
| 7 | Cultural sensitivity | No idioms, references, or scenarios that disadvantage specific groups |
| 8 | Construct alignment | Item content maps to stated behavioral indicators from Step 1 |
Present results as:
## Item Design Review
### Items Created
- Format: [MFC / Likert / SJT / Open-ended]
- Count: [N items]
- Dimensions covered: [list]
### Quality Check Results
| Check | Status | Notes |
|-------|--------|-------|
| Social desirability | PASS/FAIL | [details] |
| Reading level | PASS/FAIL | [details] |
| ... | ... | ... |
### Items Needing Revision
- Item [ID]: [specific issue and suggested fix]
### Recommendations
- [Any structural suggestions for the item set as a whole]