When you invite a cloud AI notetaker into your meeting, you probably expect it to generate a transcript and maybe a summary. What you may not expect is that your words — your ideas, your strategy discussions, your confidential negotiations — are being harvested to train someone else's artificial intelligence.
A wave of class action lawsuits, regulatory enforcement, and catastrophic data breaches in 2025 and 2026 has exposed what privacy advocates have warned about for years: cloud-based meeting transcription tools aren't just selling you a service. They're monetizing your conversations as AI training data — often without anything resembling meaningful consent.
The Otter.ai Class Action: A Landmark Case
In August 2025, a federal class action was filed against Otter.ai in the Northern District of California that cut to the heart of the secondary use problem. As NPR reported, the lawsuit accuses Otter of "deceptively and surreptitiously" recording private conversations and using them to train its AI systems without permission from participants.
The plaintiff, Justin Brewer of California, wasn't even an Otter customer. His privacy was violated simply because another meeting participant used the tool. According to the complaint, Otter's notetaker joined his Zoom meeting, captured the entire conversation, and fed it into Otter's machine learning pipeline — all without his knowledge or consent.
The legal claims are sweeping: violations of the Electronic Communications Privacy Act, the Computer Fraud and Abuse Act, and California's Invasion of Privacy Act. But the most consequential allegation may be the simplest one: that Otter used conversations to train its speech recognition and machine learning models while shifting the responsibility for obtaining consent to individual account holders.
As the National Law Review analysis put it, the legal theory is straightforward: Otter sought permission only from meeting hosts — and sometimes not even from them — while capturing and repurposing the speech of every participant in the room.
The "Secondary Use" Problem
The Otter case illuminates a broader pattern that extends across the entire cloud transcription industry. When you agree to a cloud AI tool's terms of service, you're typically consenting to transcription. But buried in the fine print is something much more expansive: permission to use your data for "product improvement," "model training," or "service enhancement."
This is what privacy lawyers call the "secondary use" problem. Your data was collected for one purpose (transcription) but repurposed for another (training AI models that the company sells to others). The consent you gave — if you gave any — was for transcription, not for becoming an unpaid contributor to a company's AI development pipeline.
Otter's privacy policy requires users to check a box granting Otter and third parties the right to ingest private conversations "for training and product improvement purposes." But the people being recorded — your colleagues, clients, opposing counsel, job candidates — never checked that box. They may not even know the tool is running. As we explored in our analysis of AI meeting bots and wiretap law, this consent model fails entirely in all-party consent states.
When AI Training Data Gets Breached: The Mercor Catastrophe
The theoretical risk of cloud-stored conversation data became horrifyingly concrete in March 2026 when Mercor, a $10 billion AI training startup, was breached through a supply chain attack. The stolen data included approximately 3 terabytes of video interview recordings, personal data of over 40,000 contractors, and proprietary AI training methodologies belonging to frontier labs including Meta, OpenAI, and Anthropic.
The breach demonstrated exactly what happens when companies collect vast archives of recorded conversations for AI training: that data becomes an irresistible target. As PYMNTS reported, at least seven class-action lawsuits were filed against Mercor, with plaintiffs alleging the company's practices included "using recorded candidate interviews to train AI models" without adequate protections.
Meta indefinitely paused all work with Mercor. The stolen data is reportedly being auctioned on the dark web. And the 40,000+ individuals whose voices and faces were captured in those interview recordings now face a lifetime risk of deepfake impersonation — because unlike a password, you cannot change your voice.
GDPR Enforcement Is Escalating — and AI Training Is in the Crosshairs
European regulators are making it clear that using personal data for AI training without proper legal basis will trigger severe consequences. GDPR fines reached €1.2 billion in 2025 alone, with cumulative penalties exceeding €7.1 billion since 2018. Italian regulators have been particularly aggressive, with the Garante imposing a €5 million fine against the Replika chatbot maker for GDPR violations related to AI data processing.
The GDPR's Article 5 principle of purpose limitation explicitly requires that data collected for one purpose cannot be repurposed for another without additional legal basis. Using meeting transcripts to train AI models is a textbook violation of this principle when participants haven't specifically consented to that secondary use.
And the regulatory pressure is intensifying. The EU AI Act reaches full enforcement for high-risk AI systems on August 2, 2026, creating a second penalty layer that can reach €35 million or 7% of global turnover — substantially higher than GDPR's maximum. The European Data Protection Board's April 2025 guidance clarified that large language models rarely achieve anonymization standards, meaning companies deploying AI tools trained on personal data must conduct comprehensive impact assessments.
The Real Cost of "Free" Transcription
There's an old adage in tech: if the product is free, you are the product. Cloud AI meeting tools have taken this principle to an extreme. When a transcription service offers a generous free tier, the business model isn't advertising — it's data extraction.
Every meeting you record becomes training data. Every speaker's voice characteristics become inputs for speaker diarization models. Every industry-specific term, every jargon-laden strategy discussion, every confidential number becomes part of the corpus that makes the AI smarter — and more valuable.
This isn't hypothetical. As our analysis of shadow AI meeting tools details, Verizon's 2026 DBIR found that employee use of unapproved AI tools has tripled in a year, from 15% to 45%, making shadow AI the third most common way employees inadvertently leak data. Every one of those unsanctioned cloud transcription sessions represents company data flowing into someone else's training pipeline.
The Legal Tsunami Is Just Beginning
The Otter.ai class action is not an isolated event. It's the leading edge of a litigation wave targeting the entire cloud transcription industry:
- Brewer v. Otter.ai (August 2025): Federal class action in the Northern District of California alleging secret recording and AI training without consent.
- Cruz v. Fireflies.AI (December 2025): BIPA class action alleging voiceprint collection and AI training without written notice or consent from meeting participants.
- Microsoft Teams BIPA Lawsuit (February 2026): Class action alleging Teams' speaker attribution feature creates and stores voiceprints in violation of Illinois biometric privacy law.
- Mercor Data Breach Litigation (April 2026): Multiple class actions filed after breach exposed 3TB of interview recordings used for AI training.
Legal experts at Littler Mendelson identified seven distinct categories of risk that AI transcription tools create for employers, including violations of privacy and wiretap laws, exposure of privileged information, discrimination concerns, and increased discovery costs. They warn that organizations using these tools face liability even when they aren't the technology provider.
Why On-Device Transcription Is the Only Real Solution
The secondary use problem has a simple architectural solution: if your meeting audio never leaves your device, it can never be used to train someone else's AI.
On-device transcription, the approach used by Apple's Speech Recognition framework, processes audio locally on the device's Neural Engine. No audio is transmitted to any server. No transcript is stored in any cloud database. No data is available for AI model training, product improvement, or any other secondary use.
This isn't just a privacy preference — it's an architectural guarantee. When processing happens on-device:
- No training data extraction: Your conversations cannot be harvested because they never leave your hardware.
- No consent gap: There's no third-party vendor to consent to, no terms of service that claim rights to your content.
- No breach exposure: Your meeting recordings can't be part of the next Mercor-style breach because they don't exist on any server.
- No regulatory risk: GDPR purpose limitation, CCPA data minimization, and BIPA biometric protections are satisfied by design — not by policy.
- True data ownership: You control your transcripts. You decide what to keep, export, or delete. No vendor lock-in, no dark patterns preventing deletion.
Apple has made on-device AI processing a cornerstone of its platform strategy. As Apple states, the cornerstone of Apple Intelligence is on-device processing — "aware of your personal information without collecting your personal information." Their foundation models are specifically designed to run on the Neural Engine without sending data to external servers, and Apple's privacy commitments explicitly state they do not use users' private personal data to train their foundation models.
What You Can Do Right Now
Whether you're an individual professional or managing an enterprise, the steps to protect your meeting data from being monetized are clear:
- Audit your tools: Review the privacy policies of every AI meeting tool in use at your organization. Search specifically for terms like "training," "improvement," "de-identified," and "third-party."
- Verify consent mechanisms: Ensure all meeting participants — not just the account holder — are providing informed consent before any recording begins.
- Opt out of AI training: Where available, disable settings that allow the vendor to use your data for model training. But recognize that opting out through settings is weaker protection than an architecture that makes collection impossible.
- Switch to on-device processing: Adopt meeting transcription tools that process audio locally, eliminating the secondary use risk at the architectural level.
- Document your policies: Create and enforce clear organizational policies about which AI tools are approved, how consent is obtained, and how meeting data is handled.
🔒 Stop Being Training Data. Start Owning Your Meetings.
Basil AI processes everything on-device using Apple's Speech Recognition. Your meetings never touch a server, never train an AI model, and never become someone else's product. 100% private by architecture, not by promise.
The Bottom Line
The cloud transcription industry built a business model on a simple bet: that users wouldn't read the fine print, that meeting participants wouldn't realize they were being recorded, and that the secondary use of conversation data for AI training would fly under the radar.
That bet is no longer paying off. Class action lawsuits are piling up. Regulators are issuing billion-euro fines. And catastrophic breaches are exposing just how dangerous it is to store vast archives of recorded conversations on cloud servers.
Your meetings contain some of the most sensitive information in your professional life — strategic plans, financial projections, personnel decisions, legal discussions, and client confidences. That information shouldn't be training data for a company you've never heard of, stored on a server you can't control, protected by security you can't verify.
The technology to keep your meetings truly private already exists. It runs on the device in your pocket. The only question is whether you'll choose to use it before your next meeting becomes someone else's training data.