← Back to Blog

Why AI Scribes Fail in the Emergency Department

Most AI scribes weren't built for the chaos of emergency medicine. Here are the concrete failure modes—hallucinations, noisy multi-speaker audio, generic MDMs, and note bloat—and what an ED-ready scribe would need to do differently.

J
Jacob LoMonaco, MD — Founder, Sampson

I've tried six different AI scribes. They all basically suck for the ED.

That's not a hot take — it's a conclusion I share with the overwhelming majority of my colleagues. In a recent study of AI scribe implementation in emergency departments, only 11.2% of encounters actually used the tool. Of the attendings who had access, barely a third tried it at all, and a small group of committed users accounted for most of the usage [1]. The adoption numbers tell a clear story: emergency physicians aren't ignoring AI scribes because they're technophobic. They're ignoring them because the tools don't work the way the ED works.

I want to be specific about why. Not the hand-wavy "AI isn't ready yet" dismissal — the actual, concrete failure modes that make these tools unusable in practice.

The Hallucination Problem

Ask an emergency physician why they don't use an AI scribe, and the first thing you'll hear is some version of: I don't trust it not to make things up.

They're right to be skeptical. In a comparative analysis of commercial AI scribe tools, omission errors accounted for 71% of all errors, addition errors 19.4%, and outright incorrect facts 6.5%. Error rates varied by vendor from 12.2% to 24.4% [2]. That means roughly one in five to one in four data points in an AI-generated note could be wrong — and you as the doc need to comb through and find every one of them — or it’s potentially your ass.

In most clinical settings, this is a documentation nuisance. In the emergency department, it's a liability time bomb. An AI that fabricates a positive finding on review of systems, invents a lab value, or hallucinates a medication dose is producing a document that could follow a physician into a courtroom years later.

The problem isn't that AI might hallucinate. The problem is that physicians have no way to predict when it will (and it will), which means they can't trust any individual data point without verifying it against their own memory of the encounter. At that point, the tool that was supposed to save time now requires a line-by-line audit of its own output.

Everything the Patient Says Gets Treated as Gospel

Emergency departments are noisy, chaotic, and full of unreliable narrators. Patients speculate about their own diagnoses. Family members offer contradictory histories. The person in the next bed is loudly describing their symptoms to a nurse. None of this is unusual — it's completely normal.

Most AI scribes don't know the difference. They treat every statement captured by the microphone as equally valid clinical information and document it accordingly. When a patient says "I think my shoulder hurts because I don't eat enough fruit," a typical AI scribe will write something like "patient reports shoulder pain secondary to insufficient fruit intake" — presenting speculation as a clinical finding.

This gets genuinely dangerous with certain patient populations. Patients with somatic delusions, psychiatric presentations, or significant secondary gain can generate transcripts full of statements that are clinically relevant as observations about the patient's mental state but absolutely should not appear in the HPI as reported facts. An AI that can't make this distinction is a threat to the license of the doc using it.

Word error rates in AI transcription tell part of the story: controlled settings produce error rates around 8.7%, but conversational and multi-speaker scenarios push error rates above 50% [3]. The emergency department is, by definition, a multi-speaker environment where the most important clinical information is often embedded in the most chaotic conversations.

The MDM Problem

Here's where most AI scribes fall apart completely — and here's where it matters most.

The Medical Decision Making section is where the money lives and where the liability concentrates. It's the section that justifies your billing level, documents your clinical reasoning, and tells a future reviewer exactly what you were thinking and why. When a case gets pulled three years from now, nobody's usually reading your review of systems. They're reading your MDM.

Most AI scribes produce MDMs that read like a medical student's first attempt at a differential diagnosis. They'll list every condition that was mentioned during the encounter — including ones that were part of the patient's past medical history, not the current presentation — and generate a generic assessment that doesn't reflect the physician's actual thought process.

A good emergency medicine MDM documents specific things: which validated decision tools you used and what they scored, which studies you independently interpreted and what you found, which consultants you spoke with and when, what high-risk interventions you initiated. It captures the reasoning, not just the words.

When researchers compared AI-generated notes directly to human scribe notes, AI produced lower-quality documentation for complex cases, and physicians contributed 60% of note characters when using AI compared to 31% with human scribes [4]. The editing burden was highest in exactly the section that matters most.

The Proofreading Paradox

This is the objection that should keep every AI scribe company up at night: If I have to read the whole thing anyway, I might as well have written it myself.

It's a completely valid point. The entire value proposition of an AI scribe rests on the assumption that reviewing a note is faster than writing one. But when the error rate is high enough, or the note is bloated enough, or the MDM is generic enough to require a rewrite — that assumption collapses.

The data bears this out. While AI scribes reduced overall documentation time by about 28% for the physicians who used them consistently, the benefit showed a steep dose-response curve [1]. Sporadic users saw almost no time savings. This likely reflects the learning curve, but it also suggests that physicians who tried the tool and found the editing burden too high simply stopped using it — and their negative experience never shows up in the efficiency numbers.

The physicians who reported the highest satisfaction with AI scribes were the ones who used them for routine, protocol-driven visits rather than complex cases [5]. That's a telling pattern. The tool works when the clinical scenario is predictable and the note is straightforward. It breaks down when things get complicated.

The Bloat Problem

Emergency physicians have spent years refining their documentation to be concise and functional. Then an AI scribe comes along and produces a 1,200-word note for a simple laceration repair.

This isn't just an aesthetic complaint. Bloated notes take longer to review, make it harder to identify errors, and bury clinically important details in a wall of auto-generated text. They also create a perverse downstream problem: when every note is a novel, consultants and admitting physicians stop reading them, which defeats the purpose of thorough documentation in the first place.

Physicians consistently cite excessive note length and limited formatting options as barriers to adoption [5]. The irony is that AI scribes were supposed to solve documentation burden — but a tool that generates more text than necessary just creates a different kind of burden.

The ideal emergency medicine note isn't comprehensive for its own sake. Every sentence serves a purpose: clinical documentation, billing justification, or medicolegal protection. If a sentence doesn't do at least one of those things, it's noise.

The Workflow Problem Nobody Talks About

There's a subtler issue that doesn't show up in the research but matters enormously in practice: for many emergency physicians, writing the note is part of the clinical process.

Dictating an MDM isn't just documentation — it's the physician thinking out loud, organizing their clinical reasoning, pressure-testing their own assessment. The act of articulating "I considered a PE but the Wells score is zero and the patient has a much more likely alternative diagnosis" isn't redundant to their thinking — it is their thinking.

An AI that takes over this process entirely doesn't just change how the note gets written. It changes how the physician reasons through the case. That's not a feature. That's a problem.

So What Would Actually Work?

I don't think AI scribes in the ED are a lost cause. I actually think they are the inevitable future of ED documentation and that our field stands to gain much by building them the right way. The burnout data is too compelling to ignore — studies show they can reduce burnout by as much as 74% among consistent users, with significant improvements in cognitive load and work-life balance [6-7]. Physicians who successfully integrate these tools report getting home earlier and finishing charts on shift. That matters. That's people's lives.

Read that again. Up to 74% reduction in burnout. I don’t know about you, but the serious majority of the ED docs I’m personal friends with are actively looking for a path out of our field; the young, the old, it does not matter, it seems like everyone is burned out and wants out of this job, and that should scare the shit out of everybody…but that’s a topic for another day.

But the gap between the burnout reduction data and the 11.2% adoption rate tells you everything: the tools work in theory. They just don't work well enough in practice for most EPs to tolerate the tradeoffs.

Closing that gap would require a few things that don't currently exist in any of the 6 scribes I've tried, which to be fair is not all of them (seems like every time I brush my teeth someone comes out with a new one).

Accuracy that earns trust. The note needs to be reliable enough that physician review is a quick scan, not a forensic audit. That starts with medical-grade transcription — not general-purpose speech recognition hoping for the best in a loud department — and it means the system should omit uncertain information rather than guess. A missing detail is inconvenient. A fabricated one is dangerous. Last, the scribe needs to be optimized to work with what will inevitably be crap audio quality. This can take the form of audio preprocessing prior to transcription, filtering the transcript based on transcription confidence (which carries its own set of risks), or running a final second check to validate the initial transcript against the final note output.

Clinical judgment about what belongs in the note. The system has to distinguish between the physician's clinical assessment and the patient's speculation, between relevant history and background noise, between the physician's voice and the three other conversations happening within microphone range. Not everything said during an encounter is documentation. An AI scribe that doesn't understand that will always produce notes that require heavy editing.

An MDM that reflects real decision-making. The medical decision making section can't be some tired dotphrase slapped onto the bottom of a note. It needs to capture the physician's actual reasoning — the decision tools, the independent interpretations, the consultant discussions, the risk stratification. Because that's what gets billed and that's what gets read three years later by someone deciding whether you met the standard of care.

Documentation personalized to the physician. Different docs have different documentation styles. A thorough, detailed HPI tells a jury you listened and took the time. A clean, tight note gets you to the next patient when the department is blowing up.

I seriously believe AI scribes are the future of ED documentation. We’ve been complaining about documentation requirements for decades, and we’re literally looking at the solution. But that doesn’t mean that what’s out there now is going to cut it.

References

  1. Preiksaitis C, Alvarez A, Winkel M, et al. Ambient Artificial Intelligence Scribe Adoption and Documentation Time in the Emergency Department. Annals of Emergency Medicine. 2026.
  2. Arko IV L, Hudelson C, Kumar J, et al. Documenting Care With AI: A Comparative Analysis of Commercial Scribe Tools. Studies in Health Technology and Informatics. 2025.
  3. Ng JJW, Wang E, Zhou X, et al. Evaluating the Performance of Artificial Intelligence-Based Speech Recognition for Clinical Documentation: A Systematic Review. BMC Medical Informatics and Decision Making. 2025.
  4. Morey J, Jones D, Walker L, et al. Ambient Artificial Intelligence Versus Human Scribes in the Emergency Department. Annals of Emergency Medicine. 2025.
  5. Atiku S, Olakotan O, Owolanke K. Usability-Related Barriers and Facilitators Influencing the Adoption and Use of AI Scribes in Healthcare: A Scoping Review. Journal of Evaluation in Clinical Practice. 2026.
  6. Olson KD, Meeker D, Troup M, et al. Use of Ambient AI Scribes to Reduce Administrative Burden and Professional Burnout. JAMA Network Open. 2025.
  7. Schneider KR, Swann-Thomsen HE, Ribbens TG, et al. The Impact of Artificial Intelligence Scribes on Physician and Advanced Practice Provider Cognitive Load and Well-Being. Journal of the American Medical Informatics Association. 2026.

Sampson Medical is building the AI-powered scribe that emergency medicine deserves — designed by an ER doc who got tired of charting through dinner. No bloat. No bullshit. Just your note, done right.

ai-scribeemergency-medicinedocumentationmdmproduct