Selecting a neuropsychological test battery is a clinical judgment task, not a checklist exercise. A competent programmer without clinical neuropsychology training will get this wrong because:
Not all "memory tests" test the same construct. The CVLT-II/III assesses list learning with encoding strategies; Logical Memory tests narrative recall; the BVMT-R tests visual-spatial memory. Each is sensitive to different lesion profiles (Lezak et al., 2012, Ch. 11).
Test selection must match the referral question. A dementia screen requires different instruments than a TBI return-to-work evaluation or a pre-surgical epilepsy workup.
Normative data are not interchangeable. Age, education, cultural background, and premorbid ability all determine which norms to apply and whether a given score is actually impaired (Mitrushina et al., 2005).
Redundant tests waste time and fatigue patients. Over-testing degrades performance and inflates apparent impairment, particularly in older adults and those with attentional deficits (Strauss et al., 2006).
Skills relacionados
When to Use This Skill
Use this skill when you need to:
Select neuropsychological tests matched to a suspected cognitive deficit profile
Assemble a battery for a specific referral question (dementia differential, TBI, pre-surgical, forensic)
Advise on which cognitive domains to assess given a neurological condition
Evaluate whether a proposed battery has adequate domain coverage or problematic redundancy
Choose between brief screening vs. comprehensive evaluation
Do NOT use this skill for:
Interpreting test scores (that requires a different skill)
Diagnosing neurological conditions from test results alone
Before executing the domain-specific steps below, you MUST:
State the research question -- What cognitive domain(s) are being assessed and why?
Justify the method choice -- Why neuropsychological testing (not neuroimaging, behavioral paradigm)? What alternatives were considered?
Declare expected outcomes -- What deficit pattern would support the clinical/research hypothesis?
Note assumptions and limitations -- What does this battery assume about the patient? Where could it mislead?
Present the plan to the user and WAIT for confirmation before proceeding.
For detailed methodology guidance, see the research-literacy skill.
⚠️ Verification Notice
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
Step 1: Clarify the Referral Question
The referral question determines everything. Map it to one of these categories:
Referral Type
Primary Goal
Typical Battery Length
Dementia differential diagnosis
Distinguish AD vs. FTD vs. VaD vs. DLB
3--4 hours
Mild cognitive impairment screening
Detect early decline, track progression
1.5--2 hours
TBI evaluation (acute/subacute)
Document deficits, guide rehabilitation
2--3 hours
TBI evaluation (chronic/forensic)
Quantify residual deficits, effort testing
4--6 hours
Pre-surgical epilepsy workup
Lateralize/localize function, predict risk
3--5 hours
Psychiatric differential
Distinguish cognitive vs. psychiatric etiology
2--3 hours
Return-to-work/fitness-for-duty
Functional capacity in specific domains
2--4 hours
(Lezak et al., 2012, Ch. 5; Sweet et al., 2011 -- 78% of neuropsychologists use a flexible battery approach)
Step 2: Identify Target Cognitive Domains
Based on the referral question and suspected condition, select domains to assess. Every battery MUST cover at least attention/processing speed, memory, and executive function. Add domains based on the clinical picture.
Cognitive Domain Framework
Attention / Processing Speed
WAIS-IV Processing Speed Index (Coding, Symbol Search): ~15 min (Wechsler, 2008)
Trail Making Test Part A: ~3 min (Reitan, 1958; deficient if >78 sec, ages 25--54)
Continuous Performance Test (CPT-3): ~14 min (Conners, 2014)
WAIS-IV Digit Span (Forward): ~5 min (Wechsler, 2008)
Executive Function
Wisconsin Card Sorting Test (WCST-64): ~15 min (Heaton et al., 1993)
Trail Making Test Part B: ~5 min (Reitan, 1958; deficient if >273 sec, ages 25--54)
Stroop Color-Word Test: ~5 min (Golden, 1978)
Tower of London/D-KEFS Tower: ~15 min (Shallice, 1982; Delis et al., 2001)
Verbal Fluency -- FAS: ~5 min (Benton et al., 1994; mean ~36--44 words total for ages 25--54, education 12+ years)
Verbal Fluency -- Animals: ~2 min (Strauss et al., 2006; mean ~20--24 animals for ages 25--54)
Memory
WMS-IV (Logical Memory I & II, Verbal Paired Associates I & II): ~30--45 min including delay (Wechsler, 2009)
CVLT-II/CVLT-3 (California Verbal Learning Test): ~30 min (Delis et al., 2000/2017)
Every evaluation should include these unless contraindicated:
Domain
Recommended Core Test(s)
Time
Premorbid estimate
TOPF or WTAR
~10 min
Attention / Processing Speed
TMT-A + WAIS-IV Coding + Digit Span
~20 min
Executive Function
TMT-B + Verbal Fluency (FAS + Animals) + Stroop
~15 min
Verbal Memory
CVLT-II/III or RAVLT
~30 min
Visual Memory
BVMT-R or RCFT recall
~25 min
Language
BNT (30- or 60-item)
~15 min
Visuospatial
RCFT Copy or Block Design
~10 min
Motor
Grooved Pegboard (bilateral)
~10 min
Effort/Validity
TOMM Trial 1 or embedded measures
~10 min
Total
~145 min
Extended Battery (~4--6 hours)
Add these for complex referrals (forensic, dementia differential, pre-surgical):
Domain
Additional Tests
Time
Intelligence estimate
WAIS-IV (4 index scores)
~70 min
Memory (expanded)
WMS-IV (full battery)
~75 min
Executive (expanded)
WCST-64 + Tower
~30 min
Language (expanded)
Token Test + WAB-R
~40 min
Visuospatial (expanded)
JLO + Hooper VOT
~30 min
Effort/Validity (expanded)
TOMM (full) + WMT or MSVT
~30 min
Added time
~275 min
Assembly Rules
One verbal learning test: Choose CVLT-II/III OR RAVLT, not both. They measure overlapping constructs (Strauss et al., 2006--778).
One copy figure: RCFT copy OR Block Design for visuoconstruction screening. Use both only if visuospatial function is the primary question.
Delay intervals: Schedule verbal memory delay recall (~20--30 min after learning) during non-memory tasks. Same for visual memory delay.
Fatigue management: Place demanding tests (WCST, CVLT) early. Place motor tests as breaks. Offer rest periods every 60--90 min (Lezak et al., 2012, Ch. 6).
At least one validity measure: Mandatory. Use TOMM Trial 1 (sensitivity 83%, specificity 93% at cutoff <=40; Denning, 2012) as a minimum. For forensic cases, use two or more PVTs from different modalities (Sweet et al., 2011).
Step 4: Select Appropriate Norms
Normative Data Decision Tree
Age: Always match. Most tests provide age-stratified norms.
Education: Use education-corrected norms when available (e.g., Heaton et al., 2004 norms for TMT, WCST, verbal fluency).
Premorbid IQ: For patients with estimated IQ far from average, IQ-adjusted norms improve accuracy over education alone. MOANS norms found BNT, Token Test, and JLO correlate more strongly with IQ (r = .47--.61) than with education (r = .24--.31) (Steinberg et al., 2005).
Cultural/linguistic background: US-normed tests may overestimate impairment in non-English speakers or culturally diverse populations (Lucas et al., 2005; Pena-Casanova et al., 2009). Use population-specific norms when available (e.g., NP-NUMBRS for Spanish speakers).
Sex: Match when norms are available. Grooved Pegboard shows significant sex differences: women faster than men (Ruff & Parker, 1993). Finger Tapping: men faster, especially in older groups.
Premorbid Estimation
TOPF (Test of Premorbid Functioning): 70 irregular words, co-normed with WAIS-IV/WMS-IV, IQ range 53--141 (Pearson, 2009). Preferred for current use.
WTAR (Wechsler Test of Adult Reading): predecessor to TOPF, co-normed with WAIS-III/WMS-III (Wechsler, 2001). Acceptable if TOPF unavailable.
Caution: Both underestimate premorbid IQ in high-functioning individuals and overestimate in low-functioning individuals (Bright & van der Linde, 2020). Supplement with demographic-based estimates.
Step 5: Address Common Pitfalls
Practice Effects in Serial Assessment
Practice effects average d = 0.24--0.28 on composite scores at 6--12 month retest intervals (Calamia et al., 2012).
No consensus on minimum retest interval; effects persist for 2+ years on some measures (Heilbronner et al., 2010).
Tests most susceptible: PASAT, Stroop interference, verbal fluency, TMT-B (Beglinger et al., 2005).
Tests least susceptible: Digit Span, Letter-Number Sequencing (Beglinger et al., 2005).
Mitigation: Use alternate forms (CVLT-II has alternate form; RAVLT has multiple lists). Apply reliable change indices (RCIs) or standardized regression-based norms to interpret change (Chelune et al., 1993).
Ceiling and Floor Effects
Ceiling effects: TMT-A and simple attention tests may miss mild deficits in high-functioning individuals. Add more demanding measures (e.g., PASAT, D-KEFS verbal fluency switching) (Strauss et al., 2006).
Floor effects: WCST and complex tests may be too difficult for moderate-to-severe dementia. Substitute with simpler tasks (e.g., clock drawing, category fluency) (Lezak et al., 2012, Ch. 18).
Ecological Validity
Neuropsychological tests have modest correlations (r = .3--.5) with real-world functioning (Chaytor & Schmitter-Edgecombe, 2003).
Supplement with functional measures (e.g., Independent Living Scales, IADL checklists) when the referral question concerns everyday competence.
Executive function tests have particularly limited ecological validity; consider adding the Behavioral Assessment of the Dysexecutive Syndrome (BADS) or real-world task simulations (Wilson et al., 1996).
Symptom Validity Testing
TOMM standard cutoff (<45 Trial 2): specificity .96--1.00 but sensitivity only .15--.50 (Tombaugh, 1996). Use Trial 1 cutoff <=40 for better sensitivity (.83) at .93 specificity (Denning, 2012).
WMT (Word Memory Test): more sensitive than TOMM but higher false-positive rate in genuine MCI/dementia -- 67% of MCI patients classified as "poor effort" at standard cutoffs (Green, 2003). Use hard-easy comparison scores instead (sensitivity/specificity ~95%).
Embedded PVTs: Reliable Digit Span (RDS >= 7 as cutoff; Greiffenstein et al., 1994), CVLT-II Forced Choice <=15 (Delis et al., 2000). Use multiple embedded measures to supplement standalone PVTs.
Rule: In forensic and disability evaluations, include at least two standalone PVTs and two embedded PVTs (Larrabee, 2012).
For a comprehensive catalog of tests with administration times, normative samples, and sensitivity data, see references/test-catalog.md.
Key References
Benton, A. L., Sivan, A. B., Hamsher, K., Varney, N. R., & Spreen, O. (1994). Contributions to Neuropsychological Assessment (2nd ed.). Oxford University Press.
Delis, D. C., Kaplan, E., & Kramer, J. H. (2001). Delis-Kaplan Executive Function System. Pearson.
Heaton, R. K., Miller, S. W., Taylor, M. J., & Grant, I. (2004). Revised Comprehensive Norms for an Expanded Halstead-Reitan Battery. PAR.
Heilbronner, R. L., Sweet, J. J., Attix, D. K., Krull, K. R., Henry, G. K., & Hart, R. P. (2010). Official position of the AACN on serial neuropsychological assessments. The Clinical Neuropsychologist, 24, 1267--1278.
Lezak, M. D., Howieson, D. B., Bigler, E. D., & Tranel, D. (2012). Neuropsychological Assessment (5th ed.). Oxford University Press.
Mitrushina, M., Boone, K. B., Razani, J., & D'Elia, L. F. (2005). Handbook of Normative Data for Neuropsychological Assessment (2nd ed.). Oxford University Press.
Strauss, E., Sherman, E. M. S., & Spreen, O. (2006). A Compendium of Neuropsychological Tests (3rd ed.). Oxford University Press.
Sweet, J. J., Nelson, N. W., & Moberg, P. J. (2011). The TCN/AACN 2010 "salary survey." The Clinical Neuropsychologist, 25, 218--245.