AI Agents and Human Performance: A SHERPA Approach to Hybrid Systems

AI is accelerating into safety-critical workplaces, but are we designing it thoughtfully? We explore a fundamental challenge: When AI systems are designed around work-as-imagined rather than work-as-done, we risk disrupting safe practice rather than enhancing it.

Dr. Dominic Furniss

Email the author | Connect on LinkedIn

HUMAN FACTORS, SCTA, SHERPA Software

Understanding Work-as-Done Before Automating

Several years ago, I worked on a research project with Professor Mark Sujan exploring AI and automation in healthcare. We were applying SHERPA and other methods to understand how these technologies might be integrated into Intensive Care Units. The technology was just around the corner, it wasn’t clear if and how it might be implemented and what role it would play, or indeed if patients would be happy with it or not.

When we mapped the actual work-as-done, we uncovered something revealing. Sometimes nurses would administer medications without prescription. This was against hospital rules, but it happened when medication was urgent and doctors were unavailable. For experienced nurses dealing with non-dangerous drugs, the risk of inaction often outweighed the non-compliance. There were many nuances and caveats to this, e.g. if it was a continuation of a drug they were already receiving, something less potent, the nurse was experienced, etc. then that was perceived as lower risk.

This insight opened up design ideas and possibilities. We could create a system where a nurse documented they were administering an unprescribed drug, which would be flagged immediately for the doctor to review and authorise. A sensible solution that accommodated reality.

But here’s the problem: the nurses told us this workaround could not be designed into the system and felt uncomfortable sharing it with management. If management knew about this practice, they would “double down and exclude it” rather than acknowledge the messy reality of urgent care. The formal system simply couldn’t cope with how work actually got done.

This story illustrates a fundamental challenge with AI and automation: when systems are designed around work-as-imagined rather than work-as-done, we risk creating technology that disrupts safe practice rather than enhancing it. As AI accelerates into our workplaces – heralded as the next Industrial Revolution – we need systematic approaches to understand and manage these human-AI hybrid systems before deployment, not after failure.

The AI could straight jacket work-as-done and erode their resilience.

The Ironies of Automation: Lessons from 40 years ago

In 1983, Lisanne Bainbridge published “Ironies of Automation,” a paper that remains remarkably relevant four decades later. Her key insight? Automation doesn’t eliminate the need for human operators; it transforms them into monitors. The problem is that monitoring is precisely what humans are poor at, especially when events requiring intervention are rare.

Professor Peter Hancock captured this perfectly: “If you design systems that rarely require a response, people will rarely respond when required.”

Consider the promise of autonomous vehicles. Drivers are expected to monitor constantly and be ready to take over at any moment. But this is an impossible ask. If someone is fully alert and monitoring, they might as well be driving themselves. If the system handles driving competently for hours on end, drivers will naturally – and understandably – drift to other thoughts and tasks. Then, when the rare critical moment arrives requiring human intervention, we expect them to rapidly diagnose the situation, understand what the AI has done, and take corrective action. In the future this could be combined with being deskilled from months or years of not actively driving.

This isn’t just about overload at crisis points – a traditional human factors concern. It’s about underload. Boredom, disengagement, and vigilance decrement are real problems when automation handles routine work. Yet in a crisis, we expect operators to switch instantly from passive monitor to active problem-solver, often with degraded skills and incomplete understanding of the situation.

This is the handback problem, and it applies across industries: from process control rooms to pharmaceutical manufacturing, from aviation to healthcare.

SHERPA for Hybrid Systems: Extending the Framework

The good news is that SHERPA – Systematic Human Error Reduction and Prediction Analysis – already provides a foundation for analysing these hybrid systems. SHERPA has always allowed specification of non-human agents; we simply need to extend how we think about task analysis, failure modes, and performance influencing factors to explicitly address AI and automation.

Task Analysis: Making the Joint System Visible

When applying SHERPA to human-AI systems, we need to think about task analysis in two complementary modes:

Mode 1: Understanding Work-as-Done

Before introducing any AI or automation, we must first understand how work actually gets done. This isn’t simply about reading procedures or observing formal processes – it’s about uncovering the reality of practice, including the informal adaptations, workarounds, and tacit knowledge that make systems function safely.

This is where our ICU medication example becomes critical. If we had designed an AI prescription system based solely on formal hospital rules, we would have created a system that couldn’t accommodate urgent medication administration when doctors weren’t immediately available. The AI might have blocked or flagged these actions as violations, forcing nurses either to delay essential treatment or to find ways to circumvent the system entirely – potentially creating new and more dangerous workarounds.

By understanding work-as-done first, we can identify:

Where current practice diverges from formal procedures and why
What informal knowledge and adaptations exist
Where AI might genuinely help versus where it might obstruct safe practice
What hidden assumptions might undermine AI system design

Without this foundation, we risk designing AI systems for an idealised world that doesn’t exist, potentially straight-jacketing the very practices that keep systems safe.

Mode 2: Mapping the Joint Human-AI System

Once we understand current work, we can then map how human actors and AI agents might work together. This involves extending Hierarchical Task Analysis (HTA) to explicitly show both human and AI responsibilities, their interaction points, and information flows between agents.

In a traditional HTA, we might document steps like:

Operator checks reading
Operator adjusts valve
Supervisor approves change

In an AI-augmented system, we need to show:

AI system monitors reading continuously
AI alerts operator to deviation
Operator reviews AI recommendation
Operator confirms or overrides action
AI executes adjustment
Operator monitors result
Supervisor reviews AI log and approves

This explicit mapping reveals the joint work system: who (human or AI) is responsible for what, where information flows between agents, and crucially, where humans must monitor, interpret, decide, or intervene. It makes visible the interaction points where new failure modes can emerge.

If this is a design exercise, there’s scope for adjusting these interactions iteratively – perhaps moving certain decisions to different agents, introducing additional checks, or redesigning information flows – all while considering how these changes affect overall system performance and safety.

AI Agents and Human Performance - mapping the Joint Human-AI System

Failure Modes: What’s Different with AI

When we move to failure analysis, we need to consider three categories of failures:

AI-Specific Failures

AI systems can fail in ways that are distinctly different from human or traditional automated systems:

Hallucinations: AI can generate plausible sounding but entirely false information. This isn’t just making up academic references; it’s pattern-matching gone wrong. I’ve heard about an AI system asked to find patterns in incident data only to return hundreds of reports of glass-related incidents; however, they merely mentioned equipment with glass components, not actual glass-related problems.
Miscalculations: AI can get both simple and complex calculations wrong, sometimes in ways that seem nonsensical to humans but make sense within the algorithm’s logic.
Training bias: If an AI is trained on biased data, it perpetuates and potentially amplifies those biases. The example of hand dryers failing to detect Black hands illustrates this – sensors trained predominantly on lighter skin tones simply didn’t work for everyone (Morgan State University Magazine, n.d.).
Confirmation bias amplification: AI systems can narrow decision options prematurely, then generate convincing (but false) supporting evidence. This can shut down the exploratory thinking necessary for good decision-making.
Misidentification: When AI identification goes wrong:
- In military contexts, we’re increasingly hearing about the use of AI. The prospect of automating warfare is frightening, but unfortunately feels inevitable. AI systems used for targeting could misidentify targets based on faulty pattern recognition or incomplete sensor data. How confident do they need to be?
- AI security systems, particularly those relying on biometric recognition, can be deliberately fooled through adversarial attacks. Face recognition systems, for example, can be deceived by carefully crafted photographs, masks, or even specific patterns of makeup. These vulnerabilities are particularly concerning in safety and security contexts where we’re relying on AI to detect threats or verify identities. The AI’s very confidence in its wrong decision makes the failure harder to catch.
- In the medical domain AI has been used to detect certain cancers for a long time now and to good effect, but it is not 100% accurate (but neither are humans).

Human-AI Interaction Failures

New failure modes emerge from the interaction between humans and AI:

Monitoring failures: Vigilance decrement, underload, and boredom when AI handles routine work successfully for long periods.
Over-reliance or automation bias: Trusting AI recommendations even when wrong, particularly when under time pressure or cognitive load.
Deskilling: Progressive loss of manual skills and tacit knowledge through lack of practice when AI handles routine operations.
Mode confusion: Not understanding what the AI is currently doing, capable of doing, or supposed to be doing.
Takeover failures: Inability to resume effective control when the AI fails or encounters situations beyond its capability.
Defeating the automation: Creating workarounds that bypass AI safeguards, often for good operational reasons but with unintended safety consequences.

System Design Failures

Finally, we have failures that stem from how the human-AI system was designed:

Systems designed for work-as-imagined rather than work-as-done (our ICU example)
Inflexibility that forces unsafe workarounds
Lack of transparency making AI decisions incomprehensible (the “black box” problem)
Poor function allocation leaving humans with impossible monitoring tasks

Performance Influencing Factors: New Considerations

When we conduct PIF analysis for hybrid systems, we need to extend our thinking beyond traditional factors:

AI-Specific PIFs:

AI transparency: Can operators understand what the AI is doing and why? Black box systems increase likelihood of misinterpretation and inappropriate trust.
Intervention frequency: How often does the system require human action? This directly relates to Hancock’s principle – rare interventions mean poor responses.
Feedback quality: Does the AI explain its decisions and confidence levels? Or just present outputs?
Training adequacy: Do people understand both AI capabilities AND limitations? Knowing what the AI can’t do is as important as knowing what it can.
Takeover scenario complexity: How difficult is it to resume manual control? How much time is available? Is the situation already degraded?
Organisational culture: Can work-as-done be discussed openly? (Our ICU lesson) Or does culture force workarounds underground?
Degraded mode design: What happens when AI fails partially? Is it obvious? Can humans continue safely?

Traditional PIFs Through an AI Lens:

The traditional PIFs we examine in SHERPA still apply – task complexity, time pressure, interface design, procedures, training – but we need to interpret them differently. For example:

Interface design now includes human-AI interaction design
Training must cover AI limitations and handback scenarios
Procedures need to address AI failure modes
Time pressure may push operators to trust AI without verification

The Path Forward: Intentional Design

AI is here and accelerating. We cannot ignore it. We must be intentional about how we integrate it into critical work systems. SHERPA provides a systematic framework to analyse human-AI systems before deployment, not after disasters.

The approach requires three critical steps:

Understand work-as-done first. Don’t design AI systems around formal procedures or idealised workflows. Understand how work actually happens, including the informal practices and workarounds that make systems function safely in reality. Design AI that supports actual practice, not imagined practice.
Map the joint system explicitly. Use extended task analysis to show both human actors and AI agents, their responsibilities, and their interaction points. Make the hybrid system visible so you can analyse it systematically.
Identify vulnerabilities using extended failure modes and PIFs. Apply systematic failure analysis that considers AI-specific failures, human-AI interaction failures, and system design failures. Assess PIFs that are unique to hybrid systems alongside traditional factors.

The risks are both immediate and long-term. In the short term, poor implementation can introduce new failure paths and make systems less safe. In the long term, inappropriate function allocation can lead to deskilling, capability loss, and operators who cannot effectively intervene when needed.

Function allocation isn’t merely a technical decision about efficiency – it’s fundamentally about designing systems that humans can work with safely and effectively over time. The methodology is ready. The question is whether we’ll use it thoughtfully.

Get in Touch

If you’re grappling with how to assess AI or automation in your critical tasks, we’d welcome the conversation. Whether you’re in process safety, pharmaceutical manufacturing, or any sector where human performance matters, SHERPA can be extended to help you design and evaluate hybrid systems systematically.

References

Bainbridge, L. (1983). Ironies of Automation. Automatica, 19(6), 775-779.

Embrey, D.E. (1986). SHERPA: A Systematic Human Error Reduction and Prediction Approach. Proceedings of the International Topical Meeting on Advances in Human Factors in Nuclear Power Systems, Knoxville, Tennessee.

Hancock, P.A. (2014). Automation: How much is too much? Ergonomics, 57(3), 449-454.

Morgan State University Magazine. (n.d.). The Bias in the Machine: Facial Recognition Technology and Racial Disparities. Retrieved from https://magazine.morgan.edu/artificial-intelligence/

Acknowledgements

This blog was developed through collaboration with Claude (Anthropic), an AI assistant. The process itself exemplifies human-AI interaction: I provided the conceptual framework, research stories, and domain expertise; Claude helped structure the argument, identify gaps in logic, and draft prose; I reviewed, refined, and approved the final content. The result is better than either of us would have produced alone – precisely the kind of thoughtful human-AI collaboration we advocate designing into critical systems.

About the Author

Dominic Furniss is a Senior Human Factors Consultant at Human Reliability Associates, specialising in Safety Critical Task Analysis and human performance in high-hazard industries.

AI Agents and Human Performance: A SHERPA Approach to Hybrid Systems

Dr. Dominic Furniss

Understanding Work-as-Done Before Automating

The Ironies of Automation: Lessons from 40 years ago

SHERPA for Hybrid Systems: Extending the Framework

Task Analysis: Making the Joint System Visible

Failure Modes: What’s Different with AI

Performance Influencing Factors: New Considerations

The Path Forward: Intentional Design

Get in Touch

References

Acknowledgements

About the Author

Specialist HRA insights direct to your inbox

Free mini-course

Download our handbook

Our latest blogs...

A Different Perspective on SCTA: When “Just Enough” Leads to Marginal Losses

Aviation Maintenance and Human Error: A Practical SCTA Case Study

The Second Story Has Layers: How Deep Does Your Investigation Go?

AI Agents and Human Performance: A SHERPA Approach to Hybrid Systems

Dr. Dominic Furniss

Understanding Work-as-Done Before Automating

The Ironies of Automation: Lessons from 40 years ago

SHERPA for Hybrid Systems: Extending the Framework

Task Analysis: Making the Joint System Visible

Failure Modes: What’s Different with AI

Performance Influencing Factors: New Considerations

The Path Forward: Intentional Design

Get in Touch

References

Acknowledgements

About the Author

A Different Perspective on SCTA: When “Just Enough” Leads to Marginal Losses

Aviation Maintenance and Human Error: A Practical SCTA Case Study

The Second Story Has Layers: How Deep Does Your Investigation Go?

Professionals like you subscribe to our Human factors thinking.

About Us

Resources

Human Factors

Sectors

Services