When Your AI Meeting Notes Invent Words You Never Said: The Hallucination Crisis Inside Cloud Transcription
Published June 05, 2026
- Cloud AI transcription tools don't just mis-hear — they invent entire sentences, including violent and defamatory ones, that no one in the meeting ever said.
- Peer-reviewed research found OpenAI's Whisper hallucinated in ~1.4% of clips, and a University of Michigan researcher found fabrications in 8 of 10 public meeting samples.
- Once a hallucinated quote is saved in a vendor's cloud, it becomes a discoverable business record — and can be subpoenaed in litigation.
- Some Whisper-based commercial products delete the source audio, leaving no way to verify what was actually said.
- On-device ASR like Apple's Speech Recognition fails by dropping words, not by fabricating quotes — and the transcript never leaves your device to be weaponized later.
Quick answer: No — popular cloud AI transcription tools regularly invent words, sentences, and even fabricated quotes that no one in the meeting actually said. Peer-reviewed studies found OpenAI's Whisper, used inside many enterprise meeting tools, hallucinated content in roughly 1.4% of transcriptions, with one researcher finding fabrications in 8 of 10 audio samples. Those fake quotes get stored, shared, and may be discoverable in litigation.
Cloud AI transcription tools don't just mishear you. They fabricate entire sentences — including violent, defamatory, and medically dangerous ones — that nobody in the meeting ever spoke. And those fake quotes get stored, shared, and may be discoverable in court.
The problem nobody talks about: AI transcripts that make things up
When organizations debate AI meeting tools, the conversation almost always focuses on consent, wiretap laws, and biometric privacy. Those risks are real, and we have covered the wave of BIPA class actions hitting AI meeting bots in detail. But there is a quieter, more insidious failure mode that the industry has been remarkably slow to confront: the transcripts themselves are not reliable. They contain words that were never spoken.
This is not a theoretical concern. It is a documented phenomenon called hallucination, and it has been studied extensively in the specific model powering a huge fraction of the cloud AI transcription market — OpenAI's Whisper. Science magazine reported on the underlying research, noting that Whisper invented sentences in roughly 1.4% of audio transcriptions tested, with a disconcerting share of those fabrications containing offensive or harmful content.
What Whisper hallucinations actually look like
The Cornell-led research team, working with Carnegie Mellon's AphasiaBank dataset, fed thousands of 10-second audio clips into Whisper and compared the output to ground-truth transcripts. The fabrications were not subtle typos. In one widely cited example documented by the Associated Press investigation, a speaker said: "He, the boy, was going to, I'm not sure exactly, take the umbrella." Whisper transcribed: "He took a big piece of a cross, a teeny, small piece … I'm sure he didn't have a terror knife so he killed a number of people."
Read that twice. The model didn't merely garble the words — it invented an entirely fictional admission of violence and attributed it to a speaker who never said anything of the kind.
Researchers found Whisper hallucinations included "explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority," and that the system invented racially offensive content, violent statements, and the names of non-existent medications. According to CIO's coverage of the study, roughly 40% of Whisper's hallucinations could have harmful consequences because the model misrepresented the speaker's intent.
How widespread is this? Worse than the 1.4% number suggests
The headline academic figure is about 1.4% of clips containing fabricated content. That sounds small until you scale it to the volume of meetings a typical organization runs through cloud transcription — and until you look at the failure rates outside the controlled academic test set.
The numbers from working engineers are dramatically worse. A University of Michigan researcher told AP that Whisper hallucinated in eight out of every ten audio transcriptions of public meetings examined. A machine-learning engineer who analyzed more than 100 hours of Whisper output found hallucinations in roughly half. A third developer who generated 26,000 transcripts with Whisper reported finding fabrications in nearly every one. As ACDIS summarized the AP investigation, this is fundamentally different from the kind of misspellings everyone expects from transcription software.
What makes the public-meetings figure especially relevant to our readers: those are conditions that look a lot like your Zoom calls. Multiple speakers, background noise, varying audio quality, interruptions, silences while someone shares a screen. Exactly the environment in which Whisper appears to fail catastrophically.
Why generative ASR systems hallucinate (and what that means for meetings)
Whisper is not the old kind of dictation software. It is a sequence-to-sequence transformer trained to produce fluent, plausible-sounding text from audio input. That training objective is the source of the problem. When the model encounters audio it cannot confidently decode — silence, an accent it has not seen much of, a speech disfluency, background noise, a moment of crosstalk — it does not output nothing. It outputs whatever its language model thinks is the most likely sentence given the context.
An arXiv investigation of Whisper hallucinations showed the problem clearly: when researchers fed Whisper recordings that contained no speech at all, the model still generated text. Pure silence was "transcribed" as sentences. The same paper notes that silences at the beginning and end of audio files appear to directly trigger hallucinations.
Cornell's Allison Koenecke and colleagues documented that these failures were not present in other commercial ASR systems from Google, Amazon, AssemblyAI, and RevAI. Whisper appears to be uniquely prone to fabrication, and OpenAI itself has cautioned against using the tool in "high-risk domains."
Whisper is everywhere — including inside meeting tools you probably use
You may be thinking: "My company doesn't use Whisper directly." The trouble is that Whisper is embedded in a sprawling supply chain of products. CIO reported that Whisper is integrated into Microsoft's and Oracle's cloud platforms and into certain versions of ChatGPT, and that medical AI vendor Nabla had used Whisper-based tools to transcribe roughly 7 million medical visits across more than 30,000 clinicians.
The pattern repeats across the meeting tool stack. Many AI "summarize my Zoom call" features are wrappers around third-party ASR — sometimes Whisper, sometimes proprietary models that exhibit similar failure modes. Zoom's AI Companion documentation explicitly notes that the product uses third-party model providers, with retention controls that vary by feature. Even if you trust the vendor's privacy policy, you are also implicitly trusting the accuracy of an ASR model you cannot inspect.
The legal time bomb: fabricated quotes as discoverable business records
Here is where the accuracy story collides with the privacy story most readers are already familiar with. When a cloud transcription tool generates a transcript, that transcript is typically stored — sometimes indefinitely, sometimes under the customer's retention settings, sometimes in a vendor data lake used to improve models. We have covered the retention practices of the major AI meeting assistants in depth.
What that means for hallucinations: a fabricated quote attributed to a real, named employee is now a written record sitting in a vendor's database. Law firm Duane Morris warned in a February 2026 analysis that "AI-transcribed conversations, meeting minutes, and/or summaries may become discoverable in litigation" and that businesses must understand that AI tools "can create a permanent, searchable record that may later be preserved and produced in litigation." None of that analysis assumes the transcript is accurate.
Worse, some Whisper-based products actively destroy the evidence that would let you challenge a hallucination. CIO reported on a commercial medical transcription product built on Whisper that "deletes the underlying audio from which transcriptions are generated, leaving medical staff no way to verify their accuracy." If your AI meeting vendor follows the same pattern, you have a written record that purports to be what someone said, with no audio to disprove it.
The defamation question: can an AI fabricate a quote that is legally actionable?
Courts are now actively grappling with whether AI-fabricated statements can give rise to defamation liability. The first major case to reach a decision, Walters v. OpenAI, was won by the AI company on the basis that the recipient should have verified the output. But the legal frontier is wide open. Minnesota Lawyer reported that emerging "AI-assisted libel" risks fall into four categories: hallucination, juxtaposition, omission, and misquote.
Meeting transcripts arguably present the cleanest possible fact pattern for an eventual plaintiff. Unlike a chatbot summarizing a stranger, an AI meeting transcript names a specific identifiable employee, places them in a specific meeting, and purports to record their exact words. If a Whisper-based tool inserts "I'm sure he didn't have a terror knife so he killed a number of people" into a transcript attributed to your VP of Sales — and that transcript is then shared with HR, a client, or a regulator — the publication element of defamation is satisfied without much creativity. The Crowell & Moring analysis of AI defamation theory notes that defenders may argue hallucinations cannot meet the "actual malice" standard, but for private-figure plaintiffs the bar is mere negligence — and deploying an ASR system known to fabricate violent content could meet it.
Cloud generative ASR vs on-device ASR: a different failure mode
The hallucination problem is not generic to all transcription technology. It is specific to a particular architectural choice: generative, decoder-heavy speech models trained to produce fluent prose. Apple's on-device Speech Recognition framework, which Basil AI uses, takes a different approach optimized for dictation and short utterances on consumer devices. When it fails, it tends to fail by dropping a word, mis-recognizing a homophone, or producing a partial result — not by hallucinating a fictional sentence about violence.
Apple's Speech framework documentation describes a system that runs natively on the device, can operate without a network connection, and is designed for low-latency real-time use. As Apple's privacy overview states, on-device processing means the underlying audio never has to leave the user's hardware in the first place.
Here is a side-by-side comparison of the failure modes and their downstream consequences:
| Dimension | Cloud generative ASR (e.g., Whisper-based meeting tools) | On-device ASR (Apple Speech Recognition / Basil AI) |
|---|---|---|
| Architecture | Sequence-to-sequence transformer trained to produce fluent prose | Discriminative ASR optimized for dictation and short utterances |
| Typical failure mode | Invents whole sentences, including violent or defamatory content | Drops or mis-recognizes individual words; rarely fabricates phrases |
| Documented hallucination rate | ~1.4% in academic tests; up to 8 of 10 in public meeting audio | Not generative; no comparable hallucination phenomenon documented |
| Where audio is processed | Vendor cloud servers (often US-based) | Locally on iPhone or Mac, via Apple Neural Engine |
| Where transcript is stored | Vendor database, often indefinitely under broad ToS | On your device and/or your iCloud, under your control |
| Audio retention for verification | Some products auto-delete source audio; transcript is the only record | You retain the recording locally and can re-listen any time |
| Discoverability risk if hallucinated | Fabricated quote is a stored business record subject to subpoena | Local file under your control; no third-party record to compel |
| BIPA/CIPA exposure | High — see BIPA lawsuit wave | No voiceprint sent to a third party |
The regulatory pincer: accuracy is now an explicit compliance issue
Accuracy used to be a quality-of-service problem. In 2026 it is increasingly a compliance problem. Article 5 of the GDPR requires that personal data be "accurate and, where necessary, kept up to date." A meeting transcript that attributes a fabricated quote to a named employee is, by definition, inaccurate personal data being processed at scale. EU data protection authorities have not yet brought a flagship AI transcription accuracy case, but the legal hook is plainly there.
In healthcare, the problem is even more acute. HIPAA's Privacy Rule requires covered entities to maintain accurate medical records, and clinician advocacy group ACDIS has flagged the Whisper findings as a documentation-integrity risk. A clinical AI scribe that invents a medication name in a chart note isn't merely embarrassing — it is a patient-safety event.
HR contexts are equally exposed. As HR Executive reported, employment lawyers warn that AI transcription accuracy issues can create disparate-impact liability when models consistently misunderstand accents, speech impediments, or other characteristics tied to protected classes — exactly the speech patterns where Cornell researchers found Whisper failed worst.
How Basil AI solves this: on-device ASR with the audio under your control
Basil AI takes the opposite architectural bet from Whisper-based cloud meeting tools. Transcription runs on your iPhone or Mac using Apple's on-device Speech Recognition. Your audio never travels to a third-party server. There is no vendor data lake accumulating transcripts. There is no Whisper-style generative model trying to "complete" silences with plausible-sounding sentences.
That architectural choice matters in three specific ways for the hallucination problem:
- Different failure mode. When on-device ASR encounters audio it cannot decode, you get a missing or garbled word — not a fabricated sentence about violence or drugs. Your transcript is honest about uncertainty rather than fluent about fiction.
- You keep the source audio. Basil AI's local recording stays on your device for the duration you choose. If anything in the transcript looks wrong, you can play back the actual audio and verify what was said. Some Whisper-based commercial products delete the source audio entirely.
- No discoverable third-party record. Because nothing leaves your device, a hallucinated quote — or any other transcript artifact — never becomes a business record sitting in a vendor's database that can be subpoenaed in litigation. You decide what to keep, what to share, and what to delete.
For privileged conversations, this is the difference between a defensible workflow and a permanent liability. For technical detail on how the underlying architecture works, see our deep dive on where Zoom AI Companion sends your meeting data and how on-device processing differs.
What to do if you must use cloud AI transcription
Many organizations will continue to use cloud AI meeting tools for legitimate reasons. If you are in that camp, the Whisper research suggests several practical guardrails:
- Never destroy the source audio. Some vendors offer or default to audio deletion. Turn that off. The audio is your only check on the transcript.
- Require human review before circulation. AI summaries should not auto-email or auto-post to Slack. A human should read the output against the audio before any transcript reaches HR, a client, or a regulator.
- Set aggressive retention limits. Storing a fabricated quote for seven years multiplies discovery risk. Set short, defensible retention periods aligned with business need.
- Avoid AI transcription for high-stakes contexts. Privileged legal calls, clinical encounters, HR investigations, and disciplinary meetings are exactly the cases where a fabricated quote causes the most damage. Default to no-AI in those rooms.
- Read the privacy policy. The Otter.ai privacy policy and similar vendor documents grant broad rights to use your content. Know what you have agreed to.
The bottom line
The privacy debate about AI meeting tools has, until now, focused almost entirely on consent and data handling. Those concerns are legitimate and accelerating. But the accuracy story is in some ways more frightening. A perfectly compliant, perfectly consented cloud transcription system can still hand you back a written record of a meeting that contains words nobody ever said — including violent, defamatory, or medically dangerous ones.
If that record then gets stored, shared, summarized, and entered into discovery, the organization that deployed the tool is the one holding the liability. On-device ASR doesn't make every transcript perfect. But it changes the failure mode from "the AI invented a sentence and put it in a vendor's database" to "the AI dropped a word and you can replay the audio to check." For sensitive meetings, that is the only acceptable architecture.