The Second Story Has Layers: How Deep Does Your Investigation Go?

If your incident investigation ends with “human error” as the root cause, you haven’t finished the investigation. You’ve just stopped asking questions.

Dr. Dominic Furniss

Email the author | Connect on LinkedIn

Human Error, Incident Investigation, MANAGEMENT

“When you explain a why, you have to be in some framework that you allow something to be true — otherwise you’re perpetually asking why.”

— Richard Feynman

I’ve written before about a famous interview with the physicist Richard Feynman, in which he’s asked a deceptively simple question: why do magnets repel each other?

What makes the interview remarkable is not the answer — it’s Feynman’s exploration of why answering the question is so difficult. Every explanation requires a framework within which that explanation makes sense. You can say “ice is slippery” and that satisfies most people. But if you keep asking why — why is ice slippery, why does pressure melt it, why does water expand when it freezes — each layer opens into a different framework of understanding, each more interesting and more revealing than the last. At a certain depth, Feynman admits he simply cannot explain things in terms of anything more familiar. The explanation has reached the boundary of the framework.

His point is very insightful: people don’t ask why ice is slippery not because they’re incurious, but because they don’t have the deeper framework that would make them dissatisfied with the answer they already have.

I think the same is true in incident investigation. When a deviation is attributed to human error and the case is closed, it’s not always because people are lazy or blame-focused. It’s often because “human error” sits within a framework that feels complete. The explanation satisfies — until you develop the conceptual vocabulary to find it unsatisfying.

What follows is an attempt to sketch the layers below that surface, and to suggest that the “second story” — a term coined by Richard Cook and David Woods, and developed in their book Behind Human Error (Woods, Dekker, Cook, Johannesen & Sarter, 2010) — isn’t a single destination. It’s a direction of travel. And the further you go, the more interesting it gets.

The First Story — and Its Two Flavours

1.1 — Human Error as Full Stop

The crudest version of the first story is pure attribution. Someone made an error. The deviation is documented. The case is closed.

This is roughly equivalent to explaining why we slip on ice by saying the ice is slippery. It is technically not wrong. It is also, as an explanation, almost entirely without useful content.

Trevor Kletz — one of the founding figures of process safety — put it with characteristic directness:

“For a long time, people were saying that most accidents were due to human error and this is true in a sense but it’s not very helpful. It’s a bit like saying that falls are due to gravity.”

This is the level at which a lot of incident investigation stops. Julie noted it clearly on the Risk Revolution podcast — a conversation in which we drew on Cook and Woods’ framing — that organisations receive 483 findings specifically because their root cause analyses stop at “human error”, which signals to the regulator that curiosity has ended, not that learning has occurred.

1.2 — Naming the Psychological Mechanism

A step forward is to name what type of error occurred. Was it a slip of attention — doing the wrong action when the right intention was there? A lapse — forgetting to do something? A mistake — acting on a wrong diagnosis of the situation? Or a non-compliance — knowingly departing from a procedure?

This taxonomy, associated with James Reason’s work, is genuinely useful. It directs attention toward different types of intervention: slips may call for better interface design, lapses for better checklists, mistakes for better training or decision support, non-compliance for a harder look at why following the rules felt impossible or unreasonable.

But here lies a trap. Naming the mechanism can feel like second story thinking — it feels analytical, systematic, thorough. And it is a step forward. But if we stop at naming the mechanism without asking what influenced it, we remain in first story territory. The question is still fundamentally: what did this person do, and what went wrong in their head?

The mechanism, understood properly, is actually an invitation to go further. Understanding that something was a slip of attention doesn’t close the question — it opens it. What shaped the attentional conditions in that moment? What made this step prone to that particular type of failure? This is where the second story begins: switching from the individual and their psychology to the context that surrounded them.

The Second Story — and Its Layers

2.1 — Performance Influencing Factors and Error Traps

The entry point to the second story is Performance Influencing Factors (PIFs): the conditions — in the task, the environment, the organisation, and the individual — that either increase or decrease the likelihood of a particular error occurring.

One important clarification before we go further: PIFs are not the same as error traps, and conflating the two is a common misconception. PIFs are neutral in direction. They can be negative — poor lighting, ambiguous labelling, high workload, inadequate procedures, time pressure — making failure more likely. But they can equally be positive — clear and distinct labels, low background noise, good handover protocols, unhurried task conditions — making success more likely and error less so.

An error trap is something more specific: a configuration of negative PIFs so severe that an error becomes probable, even predictable. As Julie Avery put it on the podcast:

“We need to stop talking about human error and start talking about error traps.”

— Julie Avery

Consider two levers that are unlabelled, adjacent, and operate in the same way. The risk of confusion isn’t hypothetical — it is baked into the design. The question isn’t whether someone could confuse them. It’s whether the system has made confusion likely. That’s an error trap. The person who eventually confuses them isn’t the cause of the problem — they are the final, predictable step in a chain that started at the drawing board.

PIF analysis is well-established within SCTA methodology, and is one of the things that distinguishes a structured assessment from a cursory one. For those wanting more detail, we have covered PIFs in depth in several previous blogs, and they form a core part of both proactive SCTA work and retrospective investigation through TABIE.

2.1a — The Summative Model

Most practitioners who work with PIFs operate, often without realising it, within a summative model. The more negative PIFs are present, the more degraded the system becomes, the higher the probability of failure. Think of the Swiss Cheese model: more holes in more slices, more chance of something getting through. The system is under chronic stress. Conditions accumulate. Eventually something gives.

This model has real explanatory power. It describes what happens when organisations are persistently understaffed, when workload is chronically elevated, when fatigue is endemic, when procedures have been allowed to decay. And it generates sensible interventions: reduce workload, fix the procedures, address the fatigue, improve the culture.

2.1b — The Interactive Model

But there is a second model that the summative approach misses, and it matters particularly for understanding how low-frequency, high-consequence events occur. Call it the interactive model.

The interactive model says you do not need a generally degraded system for a serious failure to happen. You need a specific, brief coincidence of vulnerabilities and events — a configuration that creates a pathway through even relatively robust defences. The individual elements might each seem unremarkable in isolation. Together, at one particular moment, they align.

Consider two versions of the same event type — a missed vehicle at the end of a runway:

Summative account: The facility was chronically understaffed. Controllers were carrying elevated workload across the shift. Fatigue levels were higher than they should have been. The system was under general stress, and eventually something went wrong.

Interactive account: An unscheduled emergency created an immediate diversion of attention. The shift supervisor was temporarily away from their position. Two simultaneous radio transmissions on adjacent frequencies cancelled each other out, so one message went unheard. A nuisance alarm sounded at precisely that moment. Heavy morning fog had reduced visibility to near zero. These five specific factors coincided in a ninety-second window. The vehicle was not seen.

Both accounts may be true simultaneously. The summative conditions may well have been present. But the interactive account explains why this event happened at this moment — and points toward vulnerabilities that no summative analysis would have identified, because no individual element looked critical on its own.

Think of the Swiss Cheese model again, we don’t need lots of large holes, in fact there might be only a few hole, but if they line up then there is still a route through to a bad outcome.

The practical implication for investigation is significant. If you only have a summative model in your toolkit, your corrective and preventive actions (CAPAs) will target the general conditions: staffing, shift patterns, alarm rationalisation. All worthwhile. But you may miss the interaction entirely — and interactions, in complex sociotechnical systems, are often where the real risk lives.

2.2 — Latent Conditions

Moving deeper still takes us from conditions present at the moment of failure to conditions established weeks, months, or even years earlier — decisions and designs that shaped what was possible on the day.

This is the territory of the Accident Sequence and Precursor (ASAP) model, which maps the full causal chain from precursor conditions through initiating events to failed barriers and consequences. Rather than beginning with what happened at the sharp end, ASAP asks: what decisions, designs, and organisational choices created the preconditions? What was already in place, and who had made those choices, long before anyone arrived for work that morning?

We have explored the ASAP model in depth through our Herald of Free Enterprise blog series (Part 1 and Part 2), where the analysis reveals how an August 1986 memo about departure times, design decisions about the absence of door position indicators on the bridge, and an embedded culture of time pressure created a system primed for failure — long before any individual made a mistake on the day. The sharp-end failures did not cause the disaster in any meaningful independent sense. They revealed latent vulnerabilities that were already there.

This layer itself has two distinct depths worth separating.

2.2a — Within System Decisions, e.g. Leadership/management level

The first concerns decisions taken within the operating system — a budget decision that reduced training investment, a risk assessment recommendation that was noted but not acted on, a staffing model that left a critical role under-resourced. These are decisions made by people within the organisation, often under incomplete information or competing pressures, that created the conditions for failure. They are latent not because they are hidden, but because their consequences only become visible when something goes wrong.

This layer asks: who decided what, when, and on what basis? And did those decisions increase the vulnerability of the system?

2.2b — On System Deficiencies, e.g. Applying external HFRM standards

The second, deeper level asks whether the organisation’s human factors risk management framework was itself inadequate — not just that a specific decision was made badly, but that the system for anticipating and managing foreseeable human error potential was deficient by design or omission.

This is where structured frameworks become essential as a retrospective lens. The six topics of the Human Factors Delivery Guide for COMAH sites provide a comprehensive picture of what good human factors risk management looks like: proactive risk assessment, human factors in design, critical communications, procedures, competence, and organisational factors. Applying these topics retrospectively to an incident — asking which of them, had they been properly implemented, might have broken the chain — transforms investigation from a narrative exercise into a systematic organisational assessment.

We have been developing this approach into what we call the Human Factors Risk Management (HFRM) Module within the TABIE toolbox — a structured mechanism for mapping incident findings onto recognised good practice frameworks, identifying systemic gaps, and generating specific recommendations. The module is described in detail in our recent blog on the evolving TABIE toolbox.

So — How Deep Does Your Second Story Go?

The table below acts as a maturity index — a roadmap for moving from surface attribution toward systemic learning. Most organisations will recognise themselves somewhere in the upper rows. The lower rows represent genuine frontier territory for most industries.

#	Title	Description	Example	Challenges	Opportunity
1.1	First Story – blaming the person	Human error recorded as the cause; investigation stops.	“Operator failed to follow procedure” written as root cause. Deviation closed.	People lack the framework to find this unsatisfying. It feels complete.	Familiar language — can be a bridge, if we choose to cross it.
1.2	First Story — Naming the Mechanism	The type of error is identified: slip, lapse, mistake, or non-compliance.	“Operator made a mis-selection” — attention slip noted in the investigation.	Can feel like deep analysis while remaining individual-focused. The mechanism without its context.	The mechanism is an invitation. A slip of attention asks: what shaped the attentional conditions?
2.1a	Second Story – Summative PIFs	Conditions increasing or decreasing error likelihood are identified. More negative PIFs accumulate to raise system risk.	Chronic understaffing, elevated workload, degraded procedures. System under general stress.	Often treated as the destination, not a stepping stone. Tends to generate generic corrective actions.	Moves from individual to system. Identifies structural conditions to address proactively.
2.1b	Second Story — Interactive PIFs	A specific coincidence of vulnerabilities at a particular moment creates a pathway to failure.	Emergency + absent supervisor + cancelled radio transmission + nuisance alarm + fog — all in 90 seconds.	Requires richer investigation; no single element looks critical in isolation.	Explains why this event happened now. Reveals vulnerabilities invisible to summative analysis alone.
2.2a	Second Story — Latent Conditions (Within System)	Organisational decisions made long before the event that created foreseeable vulnerabilities.	Budget decision reducing training investment; a risk assessment recommendation left unacted.	Potentially uncomfortable — traces failure to management decisions; may implicate senior figures.	Connects sharp-end failures to blunt-end accountability. Enables strategic rather than tactical improvement.
2.2b	Second Story — Latent Conditions (On System)	Assessment of whether the organisation’s framework for anticipating and managing foreseeable human error was itself adequate.	No proactive SCTA programme; no structured approach to critical task identification; HF not integrated into the investigation process.	Requires a structured HF delivery guide to assess against (e.g. COMAH); rarely part of any investigation process outside of HSE regulators.	Systematically builds in and matures a Human Factors Risk Management (HFRM) operating system.

Most organisations are operating somewhere between 1.1 and 2.1a. The lower layers require methodological infrastructure and conceptual vocabulary that many industries don’t yet have in any standardised form.

Feynman’s point about magnets is worth returning to here. He didn’t criticise the interviewer for being satisfied with a shallow explanation. He just made visible how much more there was, if you had the framework to look for it.

The second story has layers. The further down you go, the more interesting it gets.

Go Deeper

If this way of thinking about investigation resonates with you, there are two places to take it further.

Learning from Incidents course — We are developing a structured course that works through these layers in depth, using real case studies and applying the TABIE methodology to build genuine investigative capability. If you’d like to be notified when it launches, you can register your interest here: the.humanreliabilityacademy.com/courses/LfI.

Human Reliability Hub — Join our growing community of practitioners to discuss these ideas, share experiences of going deeper in investigation, and connect with others working on the same challenges. You can find us at: human-reliability.mn.co.

Acknowledgements

This blog post was drafted with the assistance of Claude (Anthropic). Claude supported the structuring of ideas, development of prose, and organisation of content across multiple drafting iterations. The concepts, domain expertise, and judgements expressed are the author’s own. The author remains responsible for the final content and any errors or omissions.

References

Woods, D.D., Dekker, S., Cook, R., Johannesen, L. and Sarter, N. (2010). Behind Human Error (2nd ed.). Farnham: Ashgate.

Further Reading