Guides EEG preprocessing: filtering, artifact rejection (ICA/ASR), re-referencing, interpolation
EEG preprocessing transforms raw electrophysiological recordings into clean data suitable for analysis. Unlike generic signal processing, every preprocessing decision in EEG involves domain-specific trade-offs: filtering at the wrong cutoff distorts ERP component morphology, choosing the wrong reference scheme biases topographic maps, and automated artifact rejection with incorrect parameters either leaves artifacts in the data or removes real neural signal.
A competent programmer without EEG training would not know that a 1 Hz high-pass filter is needed before ICA but distorts slow ERP components, that average reference requires a minimum of 64 channels, or that the order of preprocessing steps matters critically. This skill encodes the domain judgment required to build a correct EEG preprocessing pipeline.
Before executing the domain-specific steps below, you MUST:
For detailed methodology guidance, see the research-literacy skill.
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
The recommended order of preprocessing steps, based on established best practices (Luck, 2014; Onton & Makeig, 2006; Bigdely-Shamlo et al., 2015):
1. Import and inspect raw data
2. Remove (mark) bad channels
3. High-pass filter
4. Line noise removal
5. Re-reference
6. ICA decomposition and artifact removal
7. Interpolate bad channels
8. Epoch and baseline correct
9. Epoch rejection by amplitude threshold
Critical ordering constraints:
Bad channels contribute noise to re-referencing, ICA, and spatial interpolation. Identify them before other steps.
| Criterion | Threshold | Source |
|---|---|---|
| Flat signal (zero variance) | Variance < 0.5 uV^2 for > 5 s | Bigdely-Shamlo et al., 2015 |
| Excessive noise | Channel variance > 3 SD above the mean of all channels | Bigdely-Shamlo et al., 2015 |
| Low correlation with neighbors | Mean correlation with neighboring channels < 0.4 | Bigdely-Shamlo et al., 2015 |
| Excessive line noise | 50/60 Hz power > 4 SD above the mean | PREP pipeline (Bigdely-Shamlo et al., 2015) |
High-pass filtering removes slow drifts from skin potentials, electrode drift, and movement artifacts.
| Analysis Goal | Cutoff Frequency | Filter Type | Source |
|---|---|---|---|
| ERP analysis | 0.1 Hz | FIR zero-phase | Luck, 2014; Tanner et al., 2015 |
| ICA decomposition | 1 Hz | FIR zero-phase | Winkler et al., 2015 |
| Time-frequency analysis | 0.1 Hz | FIR zero-phase | Cohen, 2014 |
| Slow cortical potentials | 0.01 Hz | FIR zero-phase | Luck, 2014 |
Critical domain knowledge: For ERP studies, use 0.1 Hz for the final analysis data but 1 Hz for the ICA decomposition step. The recommended workflow is:
Why not 1 Hz for ERPs? A 1 Hz high-pass filter distorts ERP waveforms by introducing artificial pre-stimulus baseline shifts and reducing the amplitude of sustained components like the sustained negativity or the P3b (Tanner et al., 2015; Acunzo et al., 2012).
| Parameter | Recommendation | Source |
|---|---|---|
| Filter type | FIR (Finite Impulse Response), zero-phase | Widmann et al., 2015 |
| Design | Windowed sinc (Hamming or Blackman window) | Widmann et al., 2015 |
| Transition bandwidth | 2x the cutoff frequency (e.g., 0.2 Hz for a 0.1 Hz cutoff), or the EEGLAB/MNE default | Widmann et al., 2015 |
| Filter order | Determined by transition bandwidth; typically 3x sampling rate / transition bandwidth | Widmann et al., 2015 |
| Phase distortion | Zero (use filtfilt or FIR zero-phase); never use causal filtering for offline analysis | Widmann et al., 2015 |
Domain warning: IIR (Butterworth) filters introduce phase distortion that shifts ERP peak latencies. Always use FIR zero-phase filters for ERP analysis unless there is a specific reason for causal filtering (Widmann et al., 2015).
Remove power line noise at 50 Hz (Europe, Asia) or 60 Hz (Americas) and harmonics.
| Method | Description | When to Use | Source |
|---|---|---|---|
| Notch filter | Band-stop filter at 50/60 Hz | Simple but removes neural signal at that frequency | Not recommended for oscillatory analysis |
| CleanLine | Adaptive frequency-domain regression | Preferred for most analyses; preserves neural signal near 50/60 Hz | Mullen et al., 2012 |
| ZapLine | Removes line noise via DSS decomposition | Alternative to CleanLine; effective for MEG and EEG | de Cheveigne, 2020 |
| Spectral interpolation | Interpolates the notched frequency band | Preserves spectral continuity | Leske & Dalal, 2019 |
Recommendation: Use CleanLine or ZapLine over notch filters. Notch filters create spectral distortion ("ringing") and remove real neural oscillatory power in the gamma band near 50/60 Hz (Muthukumaraswamy, 2013).
EEG signals are always measured as potential differences relative to a reference. The choice of reference affects all downstream analyses.
| Reference Scheme | When to Use | Requirements | Source |
|---|---|---|---|
| Average reference | Default for dense arrays | Minimum 64 channels with good head coverage | Dien, 1998; Luck, 2014 |
| Linked mastoids | Low-density arrays (< 64 ch) | Both mastoid electrodes clean | Luck, 2014 |
| Cz reference | During ICA only (if Cz was recording reference) | -- | Convention |
| REST (Reference Electrode Standardization Technique) | Theoretical zero-reference approximation | Requires forward model | Yao, 2001 |
| Infinity reference | Approximation of neutral reference | Forward model, dense arrays | Yao, 2001 |
Decision logic:
How many clean channels do you have?
|
+-- >= 64 with good head coverage
| --> Average reference (Dien, 1998)
|
+-- 32-63 channels
| --> Linked mastoids or average reference
| (average reference becomes unreliable with sparse coverage)
|
+-- < 32 channels
--> Linked mastoids (Luck, 2014)
Domain warning: Average reference assumes dense, uniform electrode coverage of the head. With sparse arrays (< 64 channels) or missing channels, the average reference is biased and can distort topographies (Dien, 1998).
Independent Component Analysis (ICA) separates the EEG signal into statistically independent spatial components, allowing identification and removal of artifact sources (Onton & Makeig, 2006).
| Algorithm | Pros | Cons | Source |
|---|---|---|---|
| Infomax (runica) | Standard, well-validated; most commonly used | Assumes sub-Gaussian sources | Bell & Sejnowski, 1995 |
| Extended Infomax | Handles both sub- and super-Gaussian sources | Slightly slower | Lee et al., 1999 |
| AMICA | Most accurate decomposition; models multiple models | Very slow; requires more data | Palmer et al., 2012 |
| FastICA | Fast computation | Less stable; sensitive to initialization | Hyvarinen, 1999 |
| PICARD | Fast, robust convergence | Newer, less validated | Ablin et al., 2018 |
Recommendation: Use Extended Infomax (default in EEGLAB) or PICARD (default in MNE-Python) for most analyses. AMICA is preferred for high-quality research when computation time is not a constraint.
ICLabel classifies ICA components into 7 categories with probability estimates:
| Category | Action | Typical Count |
|---|---|---|
| Brain | Keep | Most components |
| Eye (blink) | Remove | 1-2 components |
| Eye (lateral) | Remove | 0-1 components |
| Muscle | Remove if probability > 0.8 | 0-3 components |
| Heart | Remove if probability > 0.8 | 0-1 components |
| Line noise | Remove if probability > 0.8 | 0-1 components |
| Channel noise | Remove if probability > 0.8 | 0-2 components |
Recommended threshold: Remove components classified as non-brain with probability > 0.80 (conservative) or > 0.50 (liberal) (Pion-Tonachini et al., 2019).
| Artifact Type | Topography | Time Course | Power Spectrum |
|---|---|---|---|
| Blink | Frontal maximum, bilateral | Sharp transients (~300 ms) | High power at low frequencies (< 5 Hz) |
| Saccade | Frontal, lateralized (left-right asymmetry) | Step-like deflections | Low-frequency dominated |
| Cardiac | Broad, diffuse or left-lateralized | Periodic (~1 Hz) | Peak at ~1 Hz |
| Muscle | Peripheral (temporal, neck electrodes) | High-frequency broadband noise | Elevated power > 20 Hz |
Domain insight: Typically remove 1-3 components for eye artifacts and 0-2 for other artifact types. Removing more than 5-6 components total risks removing neural signal. If many components appear artifactual, the data quality may be too poor for reliable analysis (Onton & Makeig, 2006).
ASR is a real-time-capable method that identifies and reconstructs artifact-contaminated data segments (Mullen et al., 2015).
| Parameter | Default | Conservative | Liberal | Source |
|---|---|---|---|---|
| Burst criterion (SD) | 20 | 10-15 | 25-30 | Mullen et al., 2015; Chang et al., 2020 |
| Window length | 0.5 s | 0.5 s | 1.0 s | Mullen et al., 2015 |
| Max rejected channels (proportion) | 0.3 | 0.2 | 0.4 | Mullen et al., 2015 |
When to use ASR vs. ICA:
Is data heavily contaminated with non-stationary artifacts?
|
+-- YES --> ASR first (for gross artifact removal), then ICA for residual eye artifacts
|
+-- NO --> ICA alone is usually sufficient
Domain insight: ASR and ICA can be combined. Apply ASR first to remove large transient artifacts (burst criterion = 20 SD), then run ICA on the ASR-cleaned data for residual artifact removal (Chang et al., 2020).
After ICA, interpolate the bad channels identified in Step 2.
| Analysis Type | Epoch Window | Baseline Window | Source |
|---|---|---|---|
| Standard ERP | -200 to 800 ms | -200 to 0 ms | Luck, 2014 |
| Late ERP (P600, LPP) | -200 to 1000 ms | -200 to 0 ms | Luck, 2014 |
| MMN | -100 to 400 ms | -100 to 0 ms | Naatanen et al., 2007 |
| Time-frequency | -1000 to 2000 ms | -500 to -200 ms (or single-trial normalization) | Cohen, 2014 |
Domain warning: For time-frequency analysis, use a longer baseline period (-500 to -200 ms) and avoid the immediate pre-stimulus period to prevent contamination by anticipatory activity. Alternatively, use single-trial baseline normalization (Cohen, 2014).
After ICA has removed stereotyped artifacts, apply amplitude-based rejection to catch remaining transient artifacts.
| Criterion | Threshold | Source |
|---|---|---|
| Peak-to-peak amplitude | Reject if > 100-150 uV | Luck, 2014 |
| Absolute amplitude | Reject if any sample exceeds +/- 75-100 uV | Luck, 2014 |
| Flat epoch | Reject if max - min < 0.5 uV (dead channel/epoch) | Bigdely-Shamlo et al., 2015 |
| Step function (for eye blinks missed by ICA) | Reject if > 80 uV step in 200 ms moving window | Luck, 2014 |
| Metric | Acceptable | Concerning | Source |
|---|---|---|---|
| Proportion of epochs rejected | < 25% | > 30% indicates poor data quality | Keil et al., 2014 |
| Minimum retained trials per condition | 30+ | < 20 is unreliable for ERPs | Boudewyn et al., 2018 |
| Minimum retained trials (absolute floor) | 15 | < 10 is unusable | Luck, 2014 |
| Analysis Type | Low-Pass Cutoff | Source |
|---|---|---|
| ERP (visualization and analysis) | 30 Hz | Luck, 2014 |
| ERP (preserving high-frequency info) | 40 Hz | Luck, 2014 |
| Oscillatory (alpha, beta) | No low-pass or 100 Hz | Cohen, 2014 |
| Oscillatory (gamma) | No low-pass or 200 Hz | Cohen, 2014 |
Domain warning: Low-pass filtering should be done after epoching to avoid edge artifacts. For ERP grand averages, a 20-30 Hz low-pass is common for visualization but should not be applied before statistical analysis of peak amplitudes/latencies, as it can shift peaks (Luck, 2014).
Based on Keil et al. (2014) and Luck (2014):
See references/ for step-by-step pipeline code templates and parameter lookup tables.