Traditional power analysis (e.g., using G*Power for a t-test) fails for neuroimaging because it cannot account for the massive multiple comparisons problem, spatial correlation structure, or the multi-level nature of neuroimaging inference. Neuroimaging requires simulation-based approaches that generate synthetic datasets, apply the full analysis pipeline including multiple comparison correction, and estimate power as the proportion of simulations detecting the effect.
A competent programmer without neuroimaging training would use standard power formulas and dramatically overestimate the power of a whole-brain analysis. They would not know that cluster-extent thresholds, random field theory corrections, and spatial smoothness all affect the effective number of tests, nor that pilot-data-based simulation is the gold standard for neuroimaging power analysis. This skill encodes the domain-specific methodology for simulation-based sample size planning.
When to Use This Skill
Planning sample size for a new fMRI, EEG, or MEG study
Conducting power analysis for a grant application or registered report
Estimating required N when pilot data or published effect size maps are available
関連 Skill
Choosing between whole-brain and ROI-based analysis based on power constraints
Evaluating the statistical adequacy of a proposed or completed study
Research Planning Protocol
Before executing the domain-specific steps below, you MUST:
State the research question — What specific question is this analysis/paradigm addressing?
Justify the method choice — Why is this approach appropriate? What alternatives were considered?
Declare expected outcomes — What results would support vs. refute the hypothesis?
Note assumptions and limitations — What does this method assume? Where could it mislead?
Present the plan to the user and WAIT for confirmation before proceeding.
For detailed methodology guidance, see the research-literacy skill.
⚠️ Verification Notice
This skill was generated by AI from academic literature. All parameters, thresholds, and citations require independent verification before use in research. If you find errors, please open an issue.
Why Traditional Power Analysis Fails for Neuroimaging
The Fundamental Problem
Standard power analysis computes the sample size for a single statistical test at a given effect size, alpha, and power. Neuroimaging violates every assumption of this framework:
Standard Assumption
Neuroimaging Reality
Consequence
Single test
~100,000 voxels tested
Alpha must be corrected, dramatically reducing per-test sensitivity
Independent tests
Voxels are spatially correlated (due to smoothing and neural organization)
Effective number of tests is much less than 100,000, but hard to compute analytically
Known effect size
Effect size varies across voxels and depends on ROI definition
No single "effect size" characterizes a study
Simple test statistic
Cluster-based, TFCE, and permutation tests have complex null distributions
Power depends on the specific inference method used
One-level inference
Subject-level estimation + group-level test
Within-subject variance and between-subject variance both affect power
Source: Mumford & Nichols, 2008; Poldrack et al., 2017.
The Pilot-Data-Based Simulation Approach
The gold standard for neuroimaging power analysis uses pilot data to simulate full datasets at varying sample sizes (Mumford & Nichols, 2008).
Step-by-Step Procedure
Step 1: Obtain pilot data or published effect-size maps
|
Step 2: Estimate expected effect sizes at regions of interest
|
Step 3: Simulate datasets with varying N
|
Step 4: Apply full analysis pipeline (including multiple comparison correction)
|
Step 5: Compute power = proportion of simulations detecting the effect
|
Step 6: Find the N that achieves target power (typically 80% or 90%)
Step 1: Obtain Pilot Data
Source
Quality
Requirements
Caveats
Own pilot study
Best
At least 10-15 subjects for stable variance estimates
Effect sizes from small pilots are inflated; use conservative estimates
Published group map
Good
Unthresholded statistical map (t-map or z-map)
May not match your exact paradigm or population
NeuroVault repository
Good
Search for comparable paradigms
Maps may use different preprocessing/analysis pipelines
Meta-analytic map (NeuroSynth, NiMARE)
Moderate
Coordinate-based or image-based meta-analysis
Provides average effect across studies, may underestimate for specific paradigms
Source: Mumford & Nichols, 2008; Poldrack et al., 2017.
Critical warning: Effect sizes from small pilot studies (N < 20) are inflated due to the winner's curse. Assume the true effect is 50-75% of the pilot estimate (Button et al., 2013).
Step 2: Estimate Effect Sizes
For ROI-based analysis:
Define the ROI a priori (from atlas, meta-analysis, or independent data)
Extract the mean effect size (Cohen's d or percent signal change) from the pilot data within the ROI
Apply the deflation correction (multiply by 0.5-0.75) for conservative estimation
For whole-brain analysis:
Use the full unthresholded statistical map as the effect-size map
The map captures spatial variation in effect size across the brain
Power will vary by region -- focus on the primary region of interest for sample size determination
Step 3: Simulate Datasets
For each candidate sample size N:
Generate 1,000-5,000 simulated group maps by:
a. Sampling N subjects from a population with the estimated effect size and variance
b. Adding realistic noise (estimated from pilot residuals or assumed Gaussian with spatial smoothness matching the pilot data)
c. Creating a group-level statistical map
Apply the smoothness estimate from the pilot data (or the planned smoothing kernel) to each simulated map
Step 4: Apply Full Analysis Pipeline
For each simulated dataset:
Compute the group-level statistical map (e.g., one-sample t-test)
Apply the planned multiple comparison correction method:
Cluster-based inference: apply cluster-defining threshold (CDT) of p < 0.001 (Eklund et al., 2016) and identify significant clusters
Voxelwise FWE: apply random field theory correction at p < 0.05 FWE
TFCE: compute TFCE image and apply permutation-based correction
FDR: apply Benjamini-Hochberg at q < 0.05
Step 5: Compute Power
Voxel-level power: For each voxel, power = proportion of simulations in which that voxel is significant
ROI-level power: Power = proportion of simulations in which at least one voxel in the target ROI is significant
Cluster-level power: Power = proportion of simulations in which a significant cluster overlaps with the target region
Report the power metric most relevant to your planned analysis (Mumford & Nichols, 2008).
Tools and Implementations
fMRIpower (Mumford & Nichols, 2008)
Feature
Description
Input
Pilot group-level statistical maps (from FSL)
Method
Resamples from pilot to estimate power at varying N
Output
Power curves for specified ROIs at different sample sizes
Requirements
FSL, R; pilot data from at least 10-15 subjects
Strengths
Uses actual pilot data; accounts for design-specific temporal autocorrelation
Limitations
Assumes pilot effect sizes are representative; FSL-specific
NeuroPowerTools (Durnez et al., 2016)
Feature
Description
Input
Unthresholded statistical map (any software)
Method
Fits mixture model to peak distribution; estimates prevalence and effect size
Output
Power estimates at varying N; optimal sample size for target power
Estimate minimum detectable cluster size at a given sample size; not a full power tool
Strengths
Fast, accounts for non-Gaussian smoothness (ACF model; Cox et al., 2017)
Limitations
Does not compute power directly; only provides cluster-extent thresholds
ROI-Based Power Shortcuts
When full simulation is impractical, ROI-based power analysis provides a reasonable alternative:
Procedure
Define the target ROI a priori (from atlas, meta-analysis, or independent data)
Extract the expected effect size (Cohen's d) from pilot data or literature:
Mean activation within ROI / standard deviation of activation across subjects
Use standard power formulas (G*Power or similar) with the ROI-level effect size
No multiple comparison correction is needed for a single a priori ROI
Effect Size Extraction from Published Results
Published Statistic
Conversion to Cohen's d
Source
t-value (within-subject)
d = t / sqrt(N)
Standard formula
t-value (between-group)
d = 2t / sqrt(df)
Standard formula
z-value
d = z / sqrt(N) (approximate)
Approximate for large N
Percent signal change + SD
d = mean_PSC / SD_PSC
Direct computation
Partial eta-squared
d = sqrt(eta^2 / (1 - eta^2))
Conversion formula
Meta-Analytic Effect Sizes
Use coordinate-based meta-analysis tools to estimate effect sizes at specific brain locations:
Tool
Method
Output
Source
NiMARE
ALE, MKDA, or other CBMA
Meta-analytic map; extract effect at ROI
Salo et al., 2023
NeuroSynth
Automated term-based meta-analysis
Association maps; extract effect at coordinates
Yarkoni et al., 2011
BrainMap
ALE meta-analysis
Coordinate-based likelihood maps
Laird et al., 2005
Caveat: Meta-analytic effect sizes aggregate across many studies with different designs, populations, and analysis pipelines. They provide a reasonable lower bound but may not match your specific paradigm (Yarkoni et al., 2011).
Current Sample Size Recommendations
Landmark Findings
Finding
Recommendation
Source
Brain-behavior associations require massive samples for replicability
N > 2,000 for whole-brain brain-behavior correlations
Marek et al., 2022
N = 20 gives ~50% power for medium fMRI effects
N = 40+ for 80% power with medium effects
Poldrack et al., 2017
80% power at uncorrected p < 0.001 requires N ~ 40 for d = 0.8
N = 40 per group for large between-group effects
Turner et al., 2018
Cluster-based inference with CDT p < 0.01 produces inflated false positives
Use CDT p < 0.001 and increase N to compensate for reduced sensitivity
Eklund et al., 2016
Within-subject designs are much more powerful than between-subject
Prefer within-subject designs when scientifically appropriate
Mumford & Nichols, 2008
Minimum Sample Size Table
Analysis Type
Minimum N (80% Power)
Effect Size Assumed
Correction Method
Source
Within-subject activation (whole-brain)
25-30
d = 0.8 (large)
Cluster-based, CDT p < 0.001
Desmond & Glover, 2002
Between-group (whole-brain, large effect)
20-25 per group
d = 0.8
Cluster-based, CDT p < 0.001
Thirion et al., 2007
Between-group (whole-brain, medium effect)
40-50 per group
d = 0.5
Cluster-based, CDT p < 0.001
Poldrack et al., 2017
ROI-based (single a priori ROI)
15-25
d = 0.5-0.8
Uncorrected (single test)
Desmond & Glover, 2002
Resting-state connectivity (group mean)
25-40
r = 0.3-0.5
FDR or NBS
Smith et al., 2011
Brain-behavior correlation (whole-brain)
2,000+
r < 0.1 (replicable)
Permutation
Marek et al., 2022
Brain-behavior correlation (single ROI)
80-200
r = 0.2-0.3
Uncorrected
Standard formula
Registered Report Considerations
Registered reports require pre-specification of sample size with a formal power analysis. For neuroimaging registered reports:
Specify the primary analysis (whole-brain vs. ROI) and the corresponding power analysis method
Use simulation-based power when possible; if not, use ROI-based power with conservative effect size estimates
Pre-specify the multiple comparison correction method and document its impact on required N
Include sensitivity analysis: What is the minimum detectable effect size at the planned N?
State stopping rules: Pre-register the exact N and analysis plan; sequential analysis requires adjustment (Lakens, 2014)
Account for attrition: Specify expected exclusion rate (typically 10-20% for fMRI) and over-recruit
Domain insight: Reviewers will be suspicious of power analyses based on large effect sizes from small pilot studies. Use conservative (deflated) effect size estimates and show power curves across a range of plausible effect sizes.
Practical Workflow for Grant Applications
When Pilot Data Are Available
Run fMRIpower or NeuroPowerTools with pilot maps
Generate power curves showing power vs. N for the primary contrast and ROI
Select N that achieves 80-90% power for the primary analysis
Add 15-20% for expected participant exclusions
Report: pilot study details, effect size estimates, power tool used, correction method, target power, final N
When No Pilot Data Are Available
Search NeuroVault for comparable paradigms; download unthresholded maps
Use NeuroPowerTools with the published map
Alternatively, estimate ROI-level effect sizes from published papers:
Extract t-values and convert to Cohen's d
Apply deflation (multiply by 0.5-0.75; Button et al., 2013)
Use G*Power for ROI-based power
As a last resort, use the benchmark table above with the analysis type closest to your planned study
Document all assumptions and state that the power analysis is based on estimated (not measured) effect sizes
Common Pitfalls
Using G*Power for whole-brain analyses: Standard power tools compute power for a single test and do not account for multiple comparison correction. This overestimates power by an order of magnitude (Mumford & Nichols, 2008)
Trusting pilot study effect sizes: Small pilot studies (N < 20) produce inflated effect sizes. Always deflate by 25-50% (Button et al., 2013)
Ignoring the correction method: Power depends critically on whether you use voxelwise FWE, cluster-based, FDR, or permutation-based correction. Power at FDR q < 0.05 can be 2-3x higher than voxelwise FWE p < 0.05 for the same N
Conflating within-subject and between-subject power: Within-subject designs (one-sample t-test on contrast maps) are much more powerful than between-subject designs (two-sample t-test) because they eliminate between-subject variance (Mumford & Nichols, 2008)
Not accounting for attrition: In fMRI, 10-20% of data may be unusable due to motion, scanner artifacts, or task non-compliance. Over-recruit accordingly
Treating all regions equally: Power varies across the brain because effect sizes and noise vary spatially. Power at your primary ROI may be adequate even if whole-brain power is low
Assuming published N is adequate: Most published fMRI studies are underpowered (Button et al., 2013). Matching a published study's N does not guarantee adequate power
Not reporting sensitivity analysis: Always report the minimum detectable effect size at your planned N, in addition to the power estimate for the expected effect
Minimum Reporting Checklist
Source of effect size estimate (pilot data, published study, meta-analysis)
Effect size metric (Cohen's d, r, percent signal change) and value used
Whether effect size deflation was applied and the correction factor
Power analysis method (simulation-based, ROI-based analytical, benchmark-based)
Power analysis tool and version (fMRIpower, NeuroPowerTools, G*Power, custom simulation)
Number of simulations (for simulation-based approaches)
Multiple comparison correction method assumed in power analysis
Statistical threshold used (e.g., CDT p < 0.001, cluster p < 0.05 FWE)
Target power level (80% or 90%)
Planned total N and N per group (if applicable)
Expected attrition rate and over-recruitment plan
Sensitivity analysis (minimum detectable effect at planned N)
References
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., et al. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14(5), 365-376.
Cox, R. W., Chen, G., Glen, D. R., Reynolds, R. C., & Taylor, P. A. (2017). FMRI clustering in AFNI: False-positive rates redux. Brain Connectivity, 7(3), 152-171.
Desmond, J. E., & Glover, G. H. (2002). Estimating sample size in functional MRI (fMRI) neuroimaging studies. Journal of Neuroscience Methods, 118(2), 115-128.
Durnez, J., Degryse, J., Moerkerke, B., et al. (2016). Power and sample size calculations for fMRI studies based on the prevalence of active peaks. bioRxiv, 049429.
Eklund, A., Nichols, T. E., & Knutsson, H. (2016). Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. PNAS, 113(28), 7900-7905.
Joyce, K. E., & Hayasaka, S. (2012). Development of PowerMap: A software package for statistical power calculation in neuroimaging studies. Neuroinformatics, 10(4), 351-365.
Laird, A. R., Fox, P. M., Price, C. J., et al. (2005). ALE meta-analysis: Controlling the false discovery rate and performing statistical contrasts. Human Brain Mapping, 25(1), 155-164.
Lakens, D. (2014). Performing high-powered studies efficiently with sequential analyses. European Journal of Social Psychology, 44(7), 701-710.
Marek, S., Tervo-Clemmens, B., Calabro, F. J., et al. (2022). Reproducible brain-wide association studies require thousands of individuals. Nature, 603(7902), 654-660.
Mumford, J. A., & Nichols, T. E. (2008). Power calculation for group fMRI studies accounting for arbitrary design and temporal autocorrelation. NeuroImage, 39(1), 261-268.
Poldrack, R. A., Baker, C. I., Durnez, J., et al. (2017). Scanning the horizon: Towards transparent and reproducible neuroimaging research. Nature Reviews Neuroscience, 18(2), 115-126.
Salo, T., Yarkoni, T., Nichols, T. E., et al. (2023). NiMARE: Neuroimaging Meta-Analysis Research Environment. NeuroImage, 268, 119862.
Smith, S. M., Miller, K. L., Salimi-Khorshidi, G., et al. (2011). Network modelling methods for FMRI. NeuroImage, 54(2), 875-891.
Thirion, B., Pinel, P., Meriaux, S., et al. (2007). Analysis of a large fMRI cohort: Statistical and methodological issues for group analyses. NeuroImage, 35(1), 105-120.
Turner, B. O., Paul, E. J., Miller, M. B., & Barbey, A. K. (2018). Small sample sizes reduce the replicability of task-based fMRI studies. Communications Biology, 1, 62.
Yarkoni, T., Poldrack, R. A., Nichols, T. E., Van Essen, D. C., & Wager, T. D. (2011). Large-scale automated synthesis of human functional neuroimaging data. , 8(8), 665-670.
See references/ for worked examples and simulation code templates.