Generate high-quality multiple choice exam questions for any law school course. Use when asked to create MCQ exam questions, practice questions, or question banks for law school exams. Trigger phrases include "exam questions", "multiple choice", "MCQ", "practice questions", "question bank", "generate questions", or references to creating law exam content. Also trigger when asked to create narrative-based or fact-pattern-based multiple choice questions for any doctrinal law course including IP, contracts, torts, con law, civ pro, etc. Supports course presets for quick setup. Always use this skill rather than generating exam questions freehand — it enforces critical quality controls including distractor validation, cognitive taxonomy tagging, and coverage balancing derived from the psychometric research literature.
This skill dispatches several sub-agents for quality checks. Each call is guarded — the skill works without them, but item-writing compliance, distractor validation, and coverage balancing are significantly weaker.
emphasis-map-builder — ranked emphasis map of testable doctrines from course materials.mcq-structural-reviewer — per-question item-writing rule checks (Haladyna-Downing-Rodriguez taxonomy).adversarial-balance-validator — adversarial challenge per question with fact-pattern citation requirement.construct-alignment-tracer — verifies every tested issue traces to assigned course materials.double-read-pass — fresh-eyes review of the generated exam and answer key.voice-style-checker — AI-tell scan.Install from the agents/ directory of this skill's repo into ~/.claude/agents/.
This skill works in both Claude Code CLI and Claude.ai / Cowork:
~/Downloads/ or user-specified path (CLI) or /mnt/user-data/outputs/ (web)project_knowledge_search and /mnt/user-data/uploads/ (web)This skill generates research-grounded multiple choice exam questions for law school courses. It works with any doctrinal law course — the skill reads the course syllabus and materials to discover the subject matter, doctrinal areas, and coverage weights at runtime.
The quality assurance framework is based on the Haladyna-Downing-Rodriguez taxonomy of evidence-based item-writing guidelines (2002), classical test theory metrics for item analysis, and research on structural flaws and distractor functioning in MCQ assessment.
Presets store default paths and metadata for known courses. When the user mentions a preset course by name (e.g., "generate IP exam questions"), use the preset values and skip to step 2 of the workflow. The user can override any preset value.
If the user's course isn't in the preset list, fall through to the standard "ask for everything" flow.
| Field | IP |
|---|---|
| Course name | Intellectual Property |
| School | University of Pennsylvania Carey Law School |
| Professor | [Your Name] |
| Casebook | IPNTA |
| Materials path | Ask user — e.g., ~/path/to/IP/course-materials/ |
| Doctrinal areas | Trade Secret, Patent, Copyright, Trademark, Right of Publicity |
| Coverage weight note | Patent, Copyright, and Trademark are the "big three" — they should receive the most questions. Trade Secret and Right of Publicity are also studied but are minor doctrines relative to the big three. |
| Cognitive taxonomy note | Use "RI" (Regime Identification) instead of "FS" — "Which IP regime applies or best protects" |
To add a new preset: add a column to this table with the course's defaults. Fields left blank fall through to the standard discovery flow (read syllabus).
Identify the course. Check if it matches a preset. If so, load defaults and confirm with the user. If not, ask for:
Read the syllabus from the course materials folder. Identify and extract:
If no syllabus is found, ask the user for the course name, doctrinal areas, and approximate coverage weights.
Calculate coverage distribution by counting the number of class sessions devoted to each major doctrinal area. Use this as the proportional weight for question distribution. Round to whole questions. Present the planned distribution to the user and ask if they want to adjust it.
Build the emphasis map. If the emphasis-map-builder agent is available, spawn it and pass
it the course materials folder path, course name, and doctrinal areas. The
agent reads all available materials and returns a ranked emphasis map.
Not all material types will be available for every course — the agent uses
whatever is provided. The course materials folder may contain any combination
of the following, listed in order of their role:
Primary source (defines what can be tested):
Emphasis signals (determine what SHOULD be tested):
Rank all testable doctrines by emphasis level. When fewer material types are available, use whatever is provided — the ranking degrades gracefully:
| Level | Criteria | MCQ Role |
|---|---|---|
| High | In readings + emphasized on slides + reinforced by transcript or class problem | Strong candidate for a question |
| Medium-High | In readings + on slides but no problem or transcript signal | Good candidate — taught but not yet practiced |
| Medium | In readings only (or on slides only for substantive slide-only material) | Fair game but should not dominate the exam |
| Excluded | Not in readings and not substantively on slides | Cannot be tested |
If only readings are available (no slides, transcripts, or problems), all doctrines rank MEDIUM and selection is based on coverage weight and the depth of treatment in the readings.
Present this emphasis ranking to the user before planning narrative clusters.
Plan the narrative clusters. Determine how many fact patterns are needed and which doctrinal areas each will cover. Each narrative should span at least 2 doctrinal areas. Plan 4-6 questions per narrative. The total across all clusters should hit the requested question count and the coverage distribution.
Present the plan to the user: number of narratives, doctrinal coverage per narrative, total question count per doctrinal area, and the course metadata that will appear on the exam. Get approval before generating.
Research consistently shows that four-option items are optimal for high-stakes assessment — three strong distractors outperform four distractors where the weakest is nonfunctional (Rodriguez 2005; Raymond et al. 2019). Do not add a fifth option.
Each question's three distractors MUST use a mix of error types:
Why this matters: The most common MCQ generation failure mode is distractors that ALL state obviously wrong legal principles. When every wrong answer is eliminable from pure doctrinal recall, students can answer correctly without reading the fact patterns — defeating the purpose of fact-pattern-based assessment. In testing, this pattern made 92% of questions answerable from general knowledge alone.
The fix is not to make ALL distractors state correct law. That overcorrects and fails to test whether students know the doctrine at all. The mix ensures that (1) doctrinal knowledge helps (eliminates ~1 distractor) but (2) students must still read and apply the facts to choose among the remaining options.
Self-check: For each question, ask: "If I cover the fact pattern and read only the stem and choices, can I identify the correct answer?" If yes, too many distractors state wrong law. At least 2 of 3 distractors should require the fact pattern to evaluate.
Excessive text shifts the construct being measured from doctrinal knowledge to reading speed (NBME Item-Writing Guide). Target these limits:
Tag every question with one of these codes. Aim for the specified distribution across the full exam:
| Code | Type | Description | Target |
|---|---|---|---|
| EA | Element Application | Apply specific doctrinal elements or tests to facts | 30% |
| AE | Argument Evaluation | Identify which party has the stronger or best argument | 20% |
| FB | Factor Balancing | Weigh factors in a multi-factor test against ambiguous facts | 15% |
| FS | Framework Selection | Identify which legal framework, test, or body of law governs | 15% |
| DD | Doctrinal Distinction | Distinguish between related or easily confused doctrines | 10% |
| NR | Negative Recognition | Recognize when a doctrine does not apply despite surface similarity | 10% |
Course presets may rename codes (e.g., IP uses "RI" for "FS"). Use the preset label if one is active.
No policy or theory questions — those belong on essay portions of the exam. MCQs should test application, analysis, and judgment, not abstract reasoning about legal policy.
Tag every wrong answer choice with one of these codes. Each question MUST use at least 2 different distractor types across its three wrong answers, and MUST follow the distractor mix requirement from the Answer Architecture section (~2 fact-dependent distractors + ~1 doctrine-testing distractor):
| Code | Type | Description |
|---|---|---|
| CW | Correct Rule, Wrong Application | Right legal standard, misapplied to these facts |
| PA | Plausible Argument, Not the Law | Sounds right as policy but isn't the doctrine |
| TN | True but Non-Responsive | Accurate legal statement, doesn't answer this question |
| IA | Incomplete Analysis | Gets part right, misses a critical element |
| CE | Common Student Error | Reflects a typical misconception or conflation |
| DC | Doctrine Confusion | Applies analysis from the wrong legal framework |
| SA | Superficially Attractive | Matches a surface feature but misses the deeper issue |
Not every cluster needs a VH item. Distribute VH items across the exam so that roughly 10-15% of all questions are VH. Overloading clusters with hard and very hard items increases construct-irrelevant difficulty.
Tag each question with an estimated difficulty:
Quality assurance occurs at four stages, numbered in execution order. Stages 1–3 run during content development (before document generation). Stage 4 runs after document generation as a blocking output gate.
| Stage | When | What | Blocking? |
|---|---|---|---|
| 1 | During content development | Per-question item-writing rules | Mandatory |
| 2 | During content development | Substantive review (adversarial challenge, fact-answer alignment, fact dependency) | Mandatory |
| 3 | During content development | Exam-level distribution summary | Lightweight |
| 4 | After document generation | Programmatic output validation of .docx files | Blocking gate |
Check every question against these item-writing rules (Haladyna-Downing-Rodriguez). Violations are empirically associated with decreased discrimination and measurement error.
Content rules:
Stem rules:
Answer choice rules:
Note: answer choice length balance and correct answer position distribution are checked programmatically by Stage 4 with defined thresholds. Do not duplicate those checks here — Stage 1 focuses on content-level item-writing rules that require human judgment.
These tests catch genuinely flawed questions. Do not skip them.
Single best answer test:
Distractor justification:
Adversarial challenge (critical):
Fact-answer alignment check (critical):
Fact dependency test (two-direction):
Automated fact-dependency validation (optional but recommended):
After generating the full exam, run the no-materials test from the MCQ
dry-run infrastructure at ~/code/mcq-dry-run/. This sends the exam
to two AI models (GPT-4o and Gemini Flash) without any fact patterns
or course materials. Questions both models answer correctly are
fact-independent candidates. Target: fewer than 25% of questions
answered correctly by both models without materials. If this threshold
is exceeded, the distractor mix is likely wrong — too many distractors
state incorrect legal principles rather than misapplying correct ones.
Run with: python run_phase.py phase1 (cost: ~$0.10, time: ~5 min).
Course material alignment test (construct alignment):
After generating all questions, compile a one-page summary. Do not over-invest in predicted statistics — they're estimates, not measurements.
Note: correct answer position distribution and answer choice length balance are verified programmatically by Stage 4. Do not duplicate those checks here.
Flag resolution gate: Before proceeding to document generation, every flagged item from Stages 1-3 must be either (a) resolved by revising the question, answer, or explanation, or (b) explicitly accepted by the user with documented justification. Do not defer flags with notes like "being addressed separately" — they will not be addressed separately. Unresolved flags are a blocking condition for document generation, just as Stage 4 failures are a blocking condition for delivery.
This stage catches catastrophic defects — missing content, mismatched documents, broken structure — that would make the exam undeliverable.
Run the reference validation script located at
~/.claude/skills/law-mcq-generator/validate_mcq.py (CLI) or write
and execute an equivalent script (web). Do not eyeball these checks.
python3 ~/.claude/skills/law-mcq-generator/validate_mcq.py \
path/to/exam.docx path/to/answer_key.docx
The script checks all of the following. Every check must PASS.
Exam document — narrative completeness:
Exam document — question structure:
Exam document — answer choice balance:
Exam document — narrative-question coherence:
Cross-document consistency:
(b) [PA]:. An entry that starts
with [PA]: without the letter prefix is a blocking failure.If any check fails: fix the defect in the generation code, regenerate the documents, and re-run the validation script. If the same check fails twice, stop and report the systematic issue to the user — do not retry a third time. A repeated failure indicates a bug in the generation logic, not a transient error.
Generate content in two phases:
Draft phase (markdown): Write all content as .md files first.
Run all QA stages (1-3) against the markdown drafts. Iterate and
fix until every stage passes. Do not generate .docx files until
all quality reviews are satisfied.
Production phase (docx + csv): Once the markdown drafts pass
QA, generate the final .docx files from the approved content
using pre-formatted templates, then generate the CSV. Run Stage 4
validation on the .docx files as a final gate.
This separation keeps the revision loop fast (editing markdown is
cheaper than regenerating .docx files) and prevents wasted work on
documents that will need to be regenerated after QA fixes.
Generate three draft files during the draft phase:
draft_full_set.md — exam questions (all fact patterns + questions
draft_answer_key_full.md — full answer key with all per-question
metadata, explanations, and distractor analysis. Every distractor
entry must be prefixed with the answer choice letter:
- (b) \[PA\]: explanation. Never omit the letter — it identifies
which answer choice the analysis refers to.draft_answer_key_student.md — student-facing answer key (correct
answers + concise explanations + source citations only)Use standard markdown formatting: **bold** for headings, > for
blockquotes (answer choices), --- for em-dashes, *italic* for
case names and emphasis. These drafts are the authoritative source —
the .docx files are generated from them.
Templates are stored in the skill directory with pre-defined styles:
from docx import Document
import os
SKILL_DIR = os.path.expanduser("~/.claude/skills/law-mcq-generator")
# Exam questions document
exam = Document(os.path.join(SKILL_DIR, "exam_template.docx"))
# Full answer key (for professor)
ak = Document(os.path.join(SKILL_DIR, "answer_key_template.docx"))
# Student answer key
sak = Document(os.path.join(SKILL_DIR, "student_answer_key_template.docx"))
Each template contains placeholder paragraphs (one per style) to keep style definitions alive. Clear all placeholder paragraphs before adding content — they exist only to preserve the style XML.
Page X of Y. — centered, auto-updating PAGE/NUMPAGES
field codes (preserved from template)Template: exam_template.docx
Header: [ COURSE CODE + TAB + COURSE NAME + TAB + SEMESTER YEAR ]
Update the header text after loading the template. The tab stops are
pre-configured for three-column layout.
Styles available in the exam template:
| Style | Purpose | Key Properties |
|---|---|---|
First Paragraph | Title page school name line | Body Text base; center it explicitly; bold + 14pt on the run |
Body Text | Narrative text, instructions, centered headers | Justified, 1.5 line spacing, ~9pt space before/after |
Title | FACT PATTERN A headers | Bold, centered, no space after |
Subtitle | The one with the [thing] | Italic, centered, no space before |
Question | Question stems (1. + TAB + stem text) | Hanging indent (left 1", first line -0.5"), 30pt space before |
Answer | Answer choices (a) through (d) | Right indent 0.33", 1.15 line spacing, auto-numbered (a) format |
List Paragraph | Bulleted instruction items | Left indent 0.5" |
Answer choice numbering: The Answer style uses Word list numbering
with lowerLetter format producing (a), (b), (c), (d) labels
automatically. The template contains the numbering definitions. When
generating, create exactly 4 Answer-styled paragraphs per question.
If auto-numbering restart proves unreliable across questions, fall back
to prepending (a) , (b) , etc. as text in each Answer paragraph —
this is more reliable with python-docx.
Title page structure (in order):
First Paragraph, centered, bold, 14pt on the run)Body Text, centered, italic on the run)FINAL EXAMINATION — [SEMESTER YEAR] (Body Text, centered)Body Text, centered)Body Text, centered)MULTIPLE CHOICE QUESTIONS (Body Text, centered, bold on the run)Body Text, centered)Body Text, justified — default alignment)List Paragraph)Fact pattern structure (per cluster):
[ Questions X through Y relate to Fact Pattern [LETTER] ] (Body Text, centered)FACT PATTERN [LETTER] (Title)The one with the [thing] (Subtitle)Body Text)Body Text — justified, one paragraph per
logical block of the narrative)Question)Answer)End with [ END OF EXAM ] (Body Text, centered).
Template: answer_key_template.docx
Styles available:
| Style | Purpose |
|---|---|
First Paragraph | Question N headers (bold on the run) |
Body Text | Metadata lines, explanations, glossary, summary |
Compact | Distractor analysis entries (plain text, no bullets) |
Normal | Horizontal rule paragraphs (paragraph bottom border) |
Document structure:
KEY TO ANSWER KEY NOTATION (centered, bold),
followed by three sections:
EA = Element Application. [description].)M = Moderate. [description].)CW = Correct Rule, Wrong Application. [description].)
Bounded above and below by horizontal rules.Horizontal rules: Use a Normal-style paragraph with a bottom
border (pBdr/bottom: val=single, sz=6, color=808080, space=1) and
spacing before/after of 120 twips. Insert one before each question
except Question 1. This creates a thin gray line for quick visual
scanning.
Per-question structure:
Question N (First Paragraph, bold)Correct Answer: (x) | Taxonomy: XX | Difficulty: M/H/VH (Body Text)Fact Pattern: [LETTER] (Body Text)Doctrinal Area: [area] (Body Text)Doctrinal Basis: [cases, statutes] (Body Text)Course Material Source: [reading pp., class #] (Body Text)Explanation: [2-3 sentences] (Body Text, with "Explanation:" as
a separate bold run followed by the content in a non-bold run)Distractor Analysis: (Body Text, bold)Compact paragraph per distractor: (letter) [CODE]: [explanation]
— prefixed with the answer choice letter in parens (e.g., (b) [PA]: Delay in asserting...), plain text, no bullets or numberingContent per question:
Exam-Level Summary at the end (see Stage 3 above).
Template: student_answer_key_template.docx
Styles available: First Paragraph, Body Text, Normal
Structure:
Body Text, centered):
[SEMESTER YEAR]ANSWER KEY FOR MULTIPLE CHOICE QUESTIONSQuestion N — Correct Answer: (x) (First Paragraph, bold)Body Text) — concise version of the full
answer key explanation (2-3 sentences, no distractor analysis)See: [reading], [class]. (Body Text, italic on the run)Normal)Generate a CSV file for quick-reference grading and data analysis. Columns:
| Column | Content |
|---|---|
Question # | Question number (1, 2, 3, ...) |
Correct Answer | Correct letter: a, b, c, or d |
Doctrinal Area | E.g., Patent, Copyright (Fair Use), Trademark |
Cognitive Taxonomy | Taxonomy code: EA, AE, FB, RI, DD, NR |
Difficulty | Difficulty estimate: M, H, VH |
Distractor 1 | Answer choice letter: b, c, etc. |
Distractor 1 Code | Distractor taxonomy code: PA, CE, CW, etc. |
Distractor 2 | Answer choice letter |
Distractor 2 Code | Distractor taxonomy code |
Distractor 3 | Answer choice letter |
Distractor 3 Code | Distractor taxonomy code |
Plain letters only — no parentheses, no explanation text. The CSV is for quick-reference grading and data analysis, not for reading.
Save all files to ~/Downloads/ (CLI) or /mnt/user-data/outputs/ (web),
or to a user-specified path.
emphasis-map-builder agent (if available) → returns emphasis mapdraft_full_set.md,
draft_answer_key_full.md, and draft_answer_key_student.md
mcq-structural-reviewer agent (if available) → per-question item-writing rule checks; fix any violations in the .md filesadversarial-balance-validator agent (if available) (type: mcq) → adversarial challenge for each question with fact-pattern citation requirement. Run fact-answer alignment check and two-direction fact dependency test. construct-alignment-tracer agent (if available) → verify construct alignment. Fix any issues in the .md files..docx files from the approved markdown content using templatesvalidate_mcq.py against the generated .docx files. This is a blocking gate. If any check fails, fix the markdown source, regenerate .docx, and re-run. If the same check fails twice, stop and report.double-read-pass agent (if available) → fresh-eyes review of the .docx documents; fix any problems in the markdown, then regenerate .docxvoice-style-checker agent (if available) → fix any AI writing tells in the markdown, then regenerate .docx.docx, 1 .csv, markdown drafts retained as source)