Comprehensive guide to sample size estimation for clinical trials including formulas for different endpoints, adjustment for dropout, interim analysis impact, non-inferiority designs, and software tools.
Every sample size calculation requires four fundamental elements:
1. Significance Level (Alpha)
Probability of Type I error (rejecting a true null hypothesis)
Convention: alpha = 0.05 (two-sided) for most clinical trials
One-sided alpha = 0.025 equivalent to two-sided 0.05
Regulatory trials almost always use two-sided alpha = 0.05
For multiple primary endpoints or interim analyses, alpha must be adjusted (split or spent)
2. Statistical Power (1 - Beta)
Probability of detecting a true effect (rejecting false null hypothesis)
Convention: 80% (beta = 0.20) for most trials; 90% (beta = 0.10) for pivotal regulatory trials
Higher power requires larger sample size: 90% power requires roughly 33% more subjects than 80% power
Underpowered trials are both scientifically uninformative and ethically problematic (expose patients to research risk without ability to answer the question)
3. Effect Size
相关技能
The minimum clinically important difference (MCID) the trial is designed to detect
Should NOT be based on what is "expected" from preliminary data — should reflect the smallest effect that would change clinical practice
Overly optimistic effect sizes lead to underpowered studies
For 1:1 randomization detecting HR = 0.75 with 80% power:
d = (1.96 + 0.842)^2 / (ln(0.75))^2 * 4 = 7.85 / 0.0827 * 4 = 380 events (approximately)
Total participants: N = d / probability of event during follow-up. Depends on accrual rate, follow-up duration, event rate, and censoring pattern. Requires simulation or Lachin-Foulkes formula.
Chi-Square and Fisher's Exact Test
For contingency tables, the sample size depends on the expected cell proportions and the desired detectable difference. Use specialized formulas or software.
Paired Designs (Crossover, Matched)
n pairs = (Z_alpha/2 + Z_beta)^2 * sigma_d^2 / delta^2
Where sigma_d is the standard deviation of the within-pair differences. Since sigma_d = sigma * sqrt(2 * (1 - rho)), paired designs require substantially fewer subjects when the within-subject correlation (rho) is high.
Adjustments
Dropout and Loss to Follow-Up
Inflate the calculated sample size to account for anticipated attrition:
N_adjusted = N / (1 - dropout_rate)
For 20% expected dropout: multiply by 1/(1-0.20) = 1.25
Use the dropout rate observed in similar prior trials
Consider differential dropout between arms (more complex adjustment)
Distinguish between dropout (lost entirely) and non-compliance (still followed)
Unequal Allocation
For k:1 randomization (e.g., 2:1 experimental:control):
Total N increases compared to 1:1 allocation
Efficiency factor: (1 + 1/k)^2 / (4/k)
2:1 allocation requires approximately 12% more total participants than 1:1
3:1 allocation requires approximately 33% more
Used when: more safety data needed on experimental arm, ethical preference, patient preference
Clustering (Cluster Randomized Trials)
Multiply the individual-level sample size by the design effect:
DE = 1 + (m - 1) * ICC
Where m = average cluster size and ICC = intra-cluster correlation coefficient.
Total participants: N_cluster = N_individual * DE
Number of clusters: k = N_cluster / m (per arm)
The number of clusters matters more than cluster size for power. Minimum: typically 6-8 clusters per arm.
Stratified Analyses
If the primary analysis uses stratified methods (e.g., stratified log-rank test, CMH test), sample size should account for stratification. Generally, stratification on prognostic factors improves power slightly.
Interim Analysis Impact
Interim analyses for efficacy or futility consume alpha (increase the overall Type I error rate) unless accounted for in the design.
Group Sequential Methods
O'Brien-Fleming boundaries: conservative early, liberal late. Minimal inflation of total sample size (< 3% for up to 5 looks).
Pocock boundaries: equal boundary at each look. Requires more inflation (up to 20-30% additional subjects).
Lan-DeMets alpha-spending: flexible timing of interim looks while controlling overall alpha.
Information Fraction
The proportion of total planned information (events, subjects) accumulated at each interim analysis. Group sequential boundaries are typically defined in terms of information fractions.
Sample Size Adjustment
When planning interim analyses, inflate the fixed-sample-size calculation by the inflation factor associated with the spending function:
O'Brien-Fleming with 1 interim: multiply by approximately 1.015
Pocock with 1 interim: multiply by approximately 1.10
Sample Size for Non-Inferiority
n per group = (Z_alpha + Z_beta)^2 * 2 * sigma^2 / (delta - delta_0)^2