Guides analysis of eye-tracking reading measures including first fixation, gaze duration, regression path, and total reading time
This skill encodes expert methodological knowledge for analyzing eye-tracking data from reading experiments. A competent programmer without psycholinguistics training would likely compute a single "reading time" per word, missing the critical insight that different eye-tracking measures tap different stages of language processing. Choosing the wrong measure for your research question -- or failing to account for spillover effects, skipping patterns, and the distinction between first-pass and second-pass reading -- leads to misattribution of cognitive processes.
Use this skill when:
Do not use this skill when:
self-paced-reading-designer for that paradigm)Before executing the domain-specific steps below, you MUST:
For detailed methodology guidance, see the research-literacy skill.
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
The following measures are ordered from earliest to latest processing stages. This hierarchy reflects the temporal unfolding of language comprehension during reading (Rayner, 1998, 2009; Clifton et al., 2007).
| Measure | Definition | Cognitive Process | When to Use |
|---|---|---|---|
| First Fixation Duration (FFD) | Duration of the first fixation on a word during first pass | Early lexical access; initial contact with the word (Rayner, 1998) | When testing early word recognition effects (frequency, predictability) |
| Single Fixation Duration (SFD) | Duration of the only fixation on a word, when exactly one first-pass fixation occurs | Cleaner measure of early lexical processing than FFD (Rayner, 2009) | When most words receive one fixation; avoids refixation confounds |
| Gaze Duration (GD) | Sum of all first-pass fixation durations on a word (before eyes leave the word in either direction) | Lexical processing / word identification (Rayner, 1998, 2009) | Default first-pass measure for most word-level analyses |
| Measure | Definition | Cognitive Process | When to Use |
|---|---|---|---|
| Go-Past Time (GPT) / Regression Path Duration | Time from first fixation on the word until first fixation to the right of the word (includes any regressions out and back) | Integration difficulty; signals reanalysis of prior material (Clifton et al., 2007) | When testing syntactic garden-path effects, semantic anomalies, discourse integration |
| Total Reading Time (TRT) | Sum of all fixation durations on a word (first pass + regressions back) | Overall processing difficulty (Rayner, 1998) | When interested in total processing cost regardless of time course |
| Regression Probability (Reg-out) | Binary: did the reader make a regression from this region? | Reanalysis / comprehension difficulty (Clifton et al., 2007) | When interested in whether (not how long) reanalysis occurred |
| Regression-in Probability | Binary: did the reader regress back to this region from downstream? | Downstream difficulty triggers revisitation (Rayner & Pollatsek, 1989) | When testing whether a region is revisited after later processing fails |
What stage of processing is your manipulation expected to affect?
|
+-- EARLY LEXICAL (word frequency, orthographic regularity, predictability)
| |
| +-- Use GAZE DURATION as primary measure (Rayner, 1998, 2009)
| +-- Report FIRST FIXATION DURATION as supplementary
| +-- Report SINGLE FIXATION DURATION if high proportion of
| single-fixation cases (Rayner, 2009)
|
+-- LATE LEXICAL / POST-LEXICAL (semantic plausibility, thematic fit)
| |
| +-- Use GAZE DURATION for early effects
| +-- Use GO-PAST TIME for integration effects (Clifton et al., 2007)
| +-- Use TOTAL READING TIME for overall effects
|
+-- SYNTACTIC (garden-path, structural ambiguity, reanalysis)
| |
| +-- Use GO-PAST TIME as primary measure (Clifton et al., 2007)
| +-- Use REGRESSION PROBABILITY as complementary binary measure
| +-- Effects often appear in the SPILLOVER REGION (1-2 words
| post-critical; Rayner & Pollatsek, 1989)
|
+-- DISCOURSE / PRAGMATIC (reference resolution, inference, coherence)
| |
| +-- Use GO-PAST TIME and TOTAL READING TIME
| +-- Effects are typically late and may span multiple words
| +-- Consider REGRESSION-IN probability for earlier regions
|
+-- EXPLORATORY / UNKNOWN TIMING
|
+-- Report ALL major measures: FFD, GD, GPT, TRT, Reg-out
+-- Let the pattern across measures inform process interpretation
| Category | Definition | Includes |
|---|---|---|
| First pass | All fixations from first entering a region until first leaving it (in either direction) | FFD, SFD, GD |
| Second pass | All fixations on a region after first leaving it | Re-reading time (TRT minus first-pass time) |
Why this matters: First-pass measures reflect initial processing; second-pass measures reflect recovery from processing difficulty encountered downstream. Conflating them obscures when processing difficulty arose.
Spillover is the delayed manifestation of a processing effect on fixations one or more words downstream of the critical word (Rayner & Pollatsek, 1989).
| Criterion | Value | Rationale | Citation |
|---|---|---|---|
| Short fixation merge | < 80 ms within 1 character of another fixation: merge with nearest fixation | Too brief for meaningful processing; likely corrective saccade (Rayner & Pollatsek, 1989) | |
| Short fixation exclude | < 80 ms (not adjacent to another fixation): exclude | Not informative for reading (Rayner & Pollatsek, 1989) | |
| Long fixation exclude | > 800 ms: exclude | Likely track loss, inattention, or blink artifact (Rayner & Pollatsek, 1989) | |
| Alternative long cutoff | > 1000 ms or > 1200 ms | Used in some labs; report which cutoff and justify |
Note: Some researchers use 50 ms as the lower bound and 1000-1200 ms as the upper bound. The critical requirement is to report your exact cutoffs and the percentage of data excluded.
| Criterion | Action | Rationale |
|---|---|---|
| Track loss | Exclude trial | Unreliable position data |
| Blinks on critical region | Exclude trial | Missing fixation data on the ROI |
| First-pass skip of critical word | Exclude from first-pass measures (FFD, SFD, GD); include in TRT | Word was not fixated during first pass |
| Comprehension accuracy | Exclude participants below 80% on comprehension questions | Ensures reading for comprehension (Rayner et al., 2006) |
Eye-tracking reading data should be analyzed with LMMs with crossed random effects for subjects and items (Baayen et al., 2008; Baayen, Davidson, & Bates, 2008):
# R formula (lme4 syntax):
gaze_duration ~ condition + (1 + condition | subject) + (1 + condition | item)
Why crossed random effects: Reading experiments use a Latin square design where every subject sees every item, but items rotate across conditions between subjects. Both subjects and items are random samples, and both contribute variance (Clark, 1973; Baayen et al., 2008).
| Approach | Specification | When to Use | Citation |
|---|---|---|---|
| Maximal | Random intercepts + all random slopes justified by design | Default starting point | Barr et al., 2013 |
| Parsimonious | Remove random correlations first, then random slopes that explain ~0 variance | When maximal model fails to converge | Bates et al., 2015; Matuschek et al., 2017 |
Convergence protocol (Barr et al., 2013; Bates et al., 2015):
|| in lme4)Reading times are right-skewed and bounded below by zero. Options:
| Approach | When to Use | Citation |
|---|---|---|
| Log-transform | Simple; commonly used; adequate for many datasets | Standard in psycholinguistics |
| Inverse transform (-1000/RT) | Can outperform log for skewed RT data | Baayen & Milin, 2010 |
| Generalized LMM (Gamma) | Models the skewness directly; avoids back-transformation issues | Lo & Andrews, 2015 |
| Raw RT with residual checks | When effects are large and residuals are approximately normal | Baayen et al., 2008 |
Recommendation: Start with raw reading times in the LMM. Check residual plots. If residuals are non-normal, apply log-transformation or fit a GLMM with Gamma family and identity link (Lo & Andrews, 2015).
When analyzing multiple reading measures on the same data:
These values serve as sanity checks for data quality (Rayner, 1998, 2009):
| Measure | Typical Range (Silent Reading) | Citation |
|---|---|---|
| Average fixation duration | 200-250 ms | Rayner, 1998, 2009 |
| Average saccade length | 7-9 characters (~2 degrees) | Rayner, 1998, 2009 |
| Regression rate | 10-15% of all saccades | Rayner, 1998 |
| Word skipping rate | Content words ~15%; function words ~35% | Rayner, 2009 |
| Fixation duration range | 50-500 ms (bulk of distribution) | Rayner, 1998 |
If your data substantially deviates from these benchmarks, check calibration quality, task instructions, and participant compliance.
Using only total reading time: TRT conflates early and late processing. If you only report TRT, you cannot determine when the effect arose. Always report at least one first-pass measure (GD) and one late measure (GPT or TRT) (Clifton et al., 2007).
Ignoring spillover effects: Many effects appear 1-2 words downstream of the critical word, especially for syntactic manipulations. Always analyze the spillover region (Rayner, 1998; Rayner & Pollatsek, 1989).
Substituting zero for skipped words: Skipped words should be treated as missing data for first-pass measures, not as zero reading time. Substituting zero artificially deflates means and inflates variance.
Using ANOVA instead of LMMs: F1/F2 ANOVA is outdated for psycholinguistic data. LMMs with crossed random effects properly handle the variance structure (Baayen et al., 2008; Barr et al., 2013).
Over-interpreting first fixation duration: FFD is contaminated by refixation planning. When a substantial proportion of words receive multiple first-pass fixations, GD is more informative (Rayner, 2009).
Defining ROIs post-hoc: Selecting regions of interest after seeing the data inflates Type I error. Define ROIs a priori based on linguistic theory.
Ignoring comprehension accuracy: If participants are not reading for comprehension (accuracy < 80%), eye-movement patterns are not interpretable as reflecting normal reading processes (Rayner et al., 2006).
Not reporting data loss: Always report the percentage of trials excluded at each cleaning step and the percentage of words skipped in the critical region.
Based on Clifton et al. (2007) and current standards in psycholinguistics:
See references/measure-computation-guide.md for step-by-step computation procedures and worked examples.