Speaker diarization—the AI feature that identifies who said what in a meeting—is one of the most useful capabilities in modern transcription tools. It transforms a wall of text into a readable, attributed conversation. But there's a dark side that almost nobody talks about: to identify speakers, cloud transcription services must build and store voiceprint databases. And unlike a password, you can never change your voice.
In early 2026, a Wired investigation into biometric data collection highlighted how companies are amassing biometric databases with minimal oversight. Voice data is now at the center of this conversation, and the meeting transcription industry is one of the largest collectors of voice biometrics on the planet.
What Is Speaker Diarization and Why Does It Require Voiceprints?
Speaker diarization is the process of partitioning an audio stream to determine "who spoke when." To accomplish this, the AI model must:
- Extract voice embeddings — Mathematical representations of each speaker's unique vocal characteristics (pitch, cadence, tone, speaking rhythm)
- Cluster similar embeddings — Group speech segments that belong to the same voice
- Label speakers — Assign identifiers ("Speaker 1," "Speaker 2") or map them to known contacts
The problem is step one. Those voice embeddings are biometric data—a mathematical fingerprint of your voice that is as unique as your face or your actual fingerprint. When a cloud service performs speaker diarization, those embeddings are computed and stored on their servers.
How Cloud Transcription Services Store Your Voice Identity
Most cloud transcription platforms retain far more than just text. To power speaker identification across meetings, they maintain persistent speaker profiles. Here's what the major players do:
Otter.ai's Speaker Identification System
Otter.ai's privacy policy describes how they collect "voice data" and "audio recordings" to improve their services. Their speaker identification feature explicitly builds profiles that persist across conversations, learning to recognize recurring participants. This means every meeting you attend through Otter adds data points to a growing voiceprint profile tied to your identity.
As we covered in our analysis of how free transcription apps monetize your voice data, these profiles aren't just used for transcription—they form the backbone of a biometric database with significant commercial value.
Fireflies.ai and Cross-Meeting Speaker Tracking
Fireflies.ai's privacy policy states they process audio recordings and associated metadata. Their speaker intelligence feature tracks individuals across multiple meetings, building comprehensive communication profiles that include speaking patterns, frequency of participation, and topic engagement.
Zoom AI Companion's Voice Analysis
Zoom's privacy policy grants broad rights over meeting content processed by their AI Companion feature. When AI Companion generates meeting summaries with speaker attribution, it must process and retain voice characteristics to distinguish between participants. This data flows through Zoom's cloud infrastructure alongside the content of your conversations.
The Biometric Data Problem: Why Voiceprints Are Different
A 2025 study from researchers covered by TechCrunch demonstrated that voiceprint databases present unique privacy risks that exceed other forms of data collection:
- Permanence: You can change a password, rotate an API key, or get a new credit card. You cannot change your voice. A compromised voiceprint is permanent.
- Cross-platform identification: A voiceprint extracted from a Zoom meeting can identify you on a podcast, a phone call, a voice message, or any other audio recording.
- Retroactive identification: Once a voiceprint is in a database, it can be matched against past recordings. You can be identified in audio captured before you ever consented to speaker diarization.
- Deepfake enablement: Detailed voice embeddings provide the raw material for AI voice cloning—exactly the data needed to create convincing audio deepfakes of your voice.
Legal Frameworks Struggling to Keep Up
Biometric data is increasingly regulated, but enforcement lags far behind the technology:
GDPR's Biometric Data Protections
Under Article 9 of the GDPR, biometric data processed for the purpose of uniquely identifying a natural person is classified as a "special category" of personal data. This means voiceprint processing requires explicit consent—not the buried checkbox in a Terms of Service, but genuine, informed, freely-given consent for the specific purpose of biometric processing.
Most cloud transcription services bury voiceprint collection under general "service improvement" language. This is almost certainly insufficient under GDPR's strict requirements for biometric consent.
Illinois BIPA: The Gold Standard Under Attack
The Illinois Biometric Information Privacy Act (BIPA) remains the strongest biometric privacy law in the United States. It requires:
- Written informed consent before collecting biometric data
- A publicly available data retention and destruction schedule
- Prohibition on selling or profiting from biometric data
- A private right of action allowing individuals to sue for violations
BIPA has been the basis for landmark settlements against companies including Facebook and Clearview AI. Yet most cloud transcription services operate as if BIPA doesn't apply to them—a legal position that is increasingly untenable as courts expand the scope of biometric privacy protection.
For a broader look at the legal landscape surrounding AI meeting recording, see our article on consent laws for AI notetakers in 2026.
New State Laws Emerging
Texas, Washington, and Colorado have all enacted biometric privacy laws, with more states following. The patchwork of regulations means that a single Zoom meeting with participants across multiple states could trigger compliance obligations under several different biometric privacy statutes simultaneously.
The Deepfake Connection: From Voiceprints to Voice Clones
Perhaps the most alarming risk of cloud-stored voiceprints is their utility for voice cloning. Modern AI voice synthesis can create convincing replicas of a person's voice with as little as three seconds of sample audio. Cloud transcription services store hours upon hours of clean, labeled audio for each speaker.
A report by The Verge documented cases of AI voice cloning being used for corporate fraud—impersonating executives to authorize wire transfers. The raw material for these attacks? Clean audio samples, exactly like those stored by cloud transcription platforms.
Consider the attack surface:
- A CEO uses Otter.ai for six months of executive meetings
- A data breach exposes the audio database (or a malicious insider accesses it)
- Attackers now have hours of the CEO's voice—enough to clone it perfectly
- They use the cloned voice to call the CFO and authorize a fraudulent transfer
This isn't theoretical. It's happening. And cloud-stored voiceprints make it exponentially easier.
The Consent Problem: Who Actually Agreed?
Here's a critical point that most people miss: in a multi-person meeting, the person who activated the transcription bot may have "agreed" to the terms of service. But what about everyone else on the call?
When a meeting host activates Otter, Fireflies, or Zoom AI Companion, every participant's voice is processed for speaker diarization. Their voiceprints are extracted and stored. But those participants may have never:
- Created an account with the service
- Read or agreed to the privacy policy
- Consented to biometric data collection
- Been informed that their voiceprint would be stored indefinitely
A brief notification that a "bot" has joined the meeting does not constitute informed consent for biometric data processing under any serious privacy framework. Yet this is the standard practice across the industry.
How On-Device Speaker Diarization Eliminates the Risk
On-device speaker diarization solves this problem at the architectural level. When speaker identification runs entirely on your device:
- No voiceprint database exists on any server. Voice embeddings are computed locally and never leave the device.
- No cross-meeting tracking is possible by third parties. The service provider has zero access to voice characteristics.
- No biometric data breach risk. You can't breach data that was never collected.
- No deepfake raw material. Clean, labeled audio stays on the device owner's hardware.
- No consent issues. Data processed locally doesn't trigger cloud-based biometric consent requirements.
Basil AI performs all speaker diarization using Apple's on-device Speech framework, leveraging the Apple Neural Engine built into every modern iPhone and Mac. Voice embeddings are computed on the device's secure hardware and never transmitted anywhere.
- All audio processing happens on-device using Apple's Neural Engine
- Voice embeddings are computed locally and stay on your hardware
- No voiceprint data is ever transmitted to any server
- Speaker labels exist only in your local transcript
- Deleting the recording destroys all associated voice data permanently
What You Can Do Right Now
If you're concerned about voiceprint privacy in your meetings, take these steps immediately:
1. Audit Your Current Transcription Tools
Check the privacy policies of every transcription service you use. Search for terms like "biometric," "voice data," "speaker identification," and "voice embedding." Understand what's being stored and for how long.
2. Request Your Biometric Data
Under GDPR, CCPA, and state biometric privacy laws, you have the right to request a copy of all biometric data a company holds about you—and the right to demand its deletion. Exercise these rights.
3. Object to Bot Recording
When a transcription bot joins your meeting, you have every right to object. Ask the host to disable it, or state on the record that you do not consent to biometric data collection.
4. Switch to On-Device Processing
The only way to completely eliminate voiceprint privacy risk is to ensure voice data never leaves the device. Tools like Basil AI that process everything locally make speaker diarization safe by design.
5. Advocate for Organizational Policies
Push your organization to adopt meeting transcription policies that address biometric data. Require privacy impact assessments for any tool that performs speaker identification.
The Future of Voice Biometrics Regulation
The regulatory landscape is shifting rapidly. The EU AI Act now classifies biometric identification systems as "high-risk," requiring rigorous compliance documentation. Several US states are considering BIPA-style laws. And class action attorneys are increasingly targeting companies that collect voice biometrics without proper consent.
Organizations that continue to use cloud-based speaker diarization without explicit biometric consent are building significant legal liability. The smart move—both ethically and legally—is to adopt on-device solutions that eliminate the problem entirely.