Human-factors root cause analysis for incidents involving operators, crews, clinicians, or any human actors. Covers James Reason's Swiss Cheese Model and active/latent failure taxonomy, HFACS (Human Factors Analysis and Classification System), Just Culture algorithm (Marx/GAIN), Crew Resource Management (CRM) findings, and high-reliability organization (HRO) principles. Use when investigating incidents in aviation, healthcare, nuclear operations, emergency response, or any context where "operator error" is a tempting but shallow explanation and the real question is what organizational and design factors made the error foreseeable.
"The root cause was human error" is almost never a useful investigation finding. It stops the inquiry exactly where it should begin. This skill teaches the frameworks that treat human performance as a window into the system that produced it — so you end up redesigning the system rather than retraining the operator.
Sidney Dekker's principle (2014, The Field Guide to Understanding "Human Error"): if your RCA concludes with a person doing something wrong, you are roughly 20% through the investigation. The next 80% is the question: why was that action reasonable from where the person was standing, with the information they had, under the pressures they faced?
This reframing is not soft on accountability. It is the opposite: it refuses to let organizational design off the hook by letting the nearest human take the blame.
James Reason (Managing the Risks of Organizational Accidents, 1997) proposed that defenses against accidents are arranged in layers, each with weaknesses ("holes"). An accident occurs when holes in successive layers align and a hazard passes through all defenses.
Reason's insight: latent failures are the more productive focus for investigation because they affect many future incidents, not just the one you're investigating. Fixing the sharp-end operator fixes one accident; fixing the latent condition fixes the class.
Perneger's cross-sectional study of 159 healthcare quality professionals found widespread misunderstanding of the Swiss Cheese Model even among practitioners who used it weekly. Common errors:
Takeaway: use the model but verify that your team's mental model of it is correct. Test understanding before running the framework.
HFACS was developed by Shappell and Wiegmann at the U.S. Naval Safety Center (2000) as a taxonomy for classifying human-factors contributors to aviation mishaps. It is a direct operationalization of Reason's Swiss Cheese Model.
Organizational Influences
├── Resource Management
├── Organizational Climate
└── Organizational Process
Unsafe Supervision
├── Inadequate Supervision
├── Planned Inappropriate Operations
├── Failure to Correct a Known Problem
└── Supervisory Violations
Preconditions for Unsafe Acts
├── Environmental Factors
│ ├── Physical Environment
│ └── Technological Environment
├── Condition of Operators
│ ├── Adverse Mental States
│ ├── Adverse Physiological States
│ └── Physical/Mental Limitations
└── Personnel Factors
├── Crew Resource Management
└── Personal Readiness
Unsafe Acts
├── Errors
│ ├── Decision Errors
│ ├── Skill-Based Errors
│ └── Perceptual Errors
└── Violations
├── Routine
└── Exceptional
HFACS forces investigators to look beyond the proximate unsafe act. A well-run HFACS analysis on an aviation incident typically finds 3–5 latent conditions for every active error. Over a fleet of incidents, the taxonomy produces patterns that drive systemic interventions.
Use the variant matched to your domain, or adapt the generic framework with domain-specific sub-categories.
The Just Culture concept, developed by David Marx (Outcome Engenuity) and adopted by the Global Aviation Information Network (GAIN), provides a decision algorithm for distinguishing the kind of unsafe act that calls for:
| Behavior | Definition | Response |
|---|---|---|
| Human error | Unintended deviation from what was planned | Console; analyze system |
| At-risk behavior | Behavioral choice that increases risk where risk is unrecognized or believed justified | Coach; remove incentive |
| Reckless behavior | Conscious disregard of substantial and unjustifiable risk | Discipline |
For each person involved in an unsafe act, walk through these questions in order:
A well-run Just Culture investigation protects operators from punitive responses to system-induced errors while maintaining accountability for reckless behavior. The algorithm is not a license to excuse everything — it is a disciplined classifier.
Organizations that punish human error destroy their own incident-reporting systems: operators learn to hide errors, and latent conditions accumulate invisibly. This is well-documented in aviation (pre-ASRS era), healthcare (pre-AHRQ PSOs), and nuclear (pre-INPO). Just Culture is an organizational survival mechanism, not moral philosophy.
Crew Resource Management emerged from the 1979 NASA workshop on flight-crew errors and the 1978 United 173 crash (fuel exhaustion due to captain's preoccupation with a landing-gear indicator). Its premise: technical skill is not enough; the crew as a team must manage communication, workload, and decision-making under stress.
The meta-analytic review of CRM training found consistent positive effects on aviation safety metrics (40–60% reduction in crew-related incidents after mature CRM adoption) but much smaller effects in healthcare, where CRM-style training was often delivered without the authority-gradient-flattening cultural change that makes it work. CRM training without culture change is classroom theatre.
AHRQ's TeamSTEPPS program adapted CRM for clinical settings. Key additions:
High-reliability organizations — aircraft carriers, nuclear power plants, wildland firefighting crews, air traffic control — operate complex systems with very low accident rates despite enormous intrinsic hazard. Weick and Sutcliffe (Managing the Unexpected, 2015) identified five cultural practices common to HROs:
These five practices are not incident-investigation techniques, but they describe the organizational substrate in which human-factors RCA actually produces change. Without them, findings get filed and the same incident recurs.
Incident occurs
│
▼
Just Culture triage
├── Discipline (rare) — proceed to HR process
├── Coach (at-risk) — document drift pattern, feedback
└── Console (most cases) — proceed to RCA
│
▼
HFACS taxonomy — classify contributing factors
│
▼
Swiss Cheese mapping — identify latent conditions at each layer
│
▼
CRM / teamwork analysis — was this a coordination failure?
│
▼
HRO principle check — which cultural practices were absent?
│
▼
Corrective actions — target latent conditions and organizational practices,
not operator retraining (unless retraining addresses a genuine skill gap)
After an incident, the organization announces "additional training" and "updated procedures." Nothing else changes. The operator who had the incident is retrained, feels ashamed, and the next operator makes the same error six months later. Recovery: ask "what would make this error physically impossible or immediately correctable," not "how do we remind people to be careful."
A lengthy incident report is filed. It is not read, it is not connected to any corrective action, and it is discovered during the next incident when someone notices the two are related. Recovery: treat the report as a work item with owners and deadlines, not an archive artifact.
The team notices the system runs hot but doesn't fail. Hot operation becomes normal. When it eventually fails, the team discovers the safety margin was gone for months. Diane Vaughan (The Challenger Launch Decision, 1996) named this "normalization of deviance" — the drift of acceptance of anomalies. Recovery: re-anchor the team to first-principles safety limits periodically, not observed operating ranges.
A new operator is involved in an incident; the finding is "insufficient experience" and the corrective action is additional training before solo duty. This conceals the reality that the system is not safe for any operator during the learning curve, and the same incident will recur with the next new hire. Recovery: redesign the onboarding or the task itself.