← Back to Articles

Speaker diarization—the AI feature that identifies who said what in a meeting—is one of the most useful capabilities in modern transcription tools. It transforms a wall of text into a readable, attributed conversation. But there's a dark side that almost nobody talks about: to identify speakers, cloud transcription services must build and store voiceprint databases. And unlike a password, you can never change your voice.

In early 2026, a Wired investigation into biometric data collection highlighted how companies are amassing biometric databases with minimal oversight. Voice data is now at the center of this conversation, and the meeting transcription industry is one of the largest collectors of voice biometrics on the planet.

What Is Speaker Diarization and Why Does It Require Voiceprints?

Speaker diarization is the process of partitioning an audio stream to determine "who spoke when." To accomplish this, the AI model must:

  1. Extract voice embeddings — Mathematical representations of each speaker's unique vocal characteristics (pitch, cadence, tone, speaking rhythm)
  2. Cluster similar embeddings — Group speech segments that belong to the same voice
  3. Label speakers — Assign identifiers ("Speaker 1," "Speaker 2") or map them to known contacts

The problem is step one. Those voice embeddings are biometric data—a mathematical fingerprint of your voice that is as unique as your face or your actual fingerprint. When a cloud service performs speaker diarization, those embeddings are computed and stored on their servers.

⚠️ Why this matters: Unlike a compromised password, you cannot reset your voice. A leaked voiceprint is a permanent biometric exposure that can be used to identify you across any audio recording, anywhere, forever.

How Cloud Transcription Services Store Your Voice Identity

Most cloud transcription platforms retain far more than just text. To power speaker identification across meetings, they maintain persistent speaker profiles. Here's what the major players do:

Otter.ai's Speaker Identification System

Otter.ai's privacy policy describes how they collect "voice data" and "audio recordings" to improve their services. Their speaker identification feature explicitly builds profiles that persist across conversations, learning to recognize recurring participants. This means every meeting you attend through Otter adds data points to a growing voiceprint profile tied to your identity.

As we covered in our analysis of how free transcription apps monetize your voice data, these profiles aren't just used for transcription—they form the backbone of a biometric database with significant commercial value.

Fireflies.ai and Cross-Meeting Speaker Tracking

Fireflies.ai's privacy policy states they process audio recordings and associated metadata. Their speaker intelligence feature tracks individuals across multiple meetings, building comprehensive communication profiles that include speaking patterns, frequency of participation, and topic engagement.

Zoom AI Companion's Voice Analysis

Zoom's privacy policy grants broad rights over meeting content processed by their AI Companion feature. When AI Companion generates meeting summaries with speaker attribution, it must process and retain voice characteristics to distinguish between participants. This data flows through Zoom's cloud infrastructure alongside the content of your conversations.

The Biometric Data Problem: Why Voiceprints Are Different

A 2025 study from researchers covered by TechCrunch demonstrated that voiceprint databases present unique privacy risks that exceed other forms of data collection:

Think about this: Every meeting you attend on a cloud transcription platform contributes to a voiceprint that could identify you across the internet. The more meetings, the more accurate the print. And you likely never explicitly consented to biometric data collection.

Legal Frameworks Struggling to Keep Up

Biometric data is increasingly regulated, but enforcement lags far behind the technology:

GDPR's Biometric Data Protections

Under Article 9 of the GDPR, biometric data processed for the purpose of uniquely identifying a natural person is classified as a "special category" of personal data. This means voiceprint processing requires explicit consent—not the buried checkbox in a Terms of Service, but genuine, informed, freely-given consent for the specific purpose of biometric processing.

Most cloud transcription services bury voiceprint collection under general "service improvement" language. This is almost certainly insufficient under GDPR's strict requirements for biometric consent.

Illinois BIPA: The Gold Standard Under Attack

The Illinois Biometric Information Privacy Act (BIPA) remains the strongest biometric privacy law in the United States. It requires:

BIPA has been the basis for landmark settlements against companies including Facebook and Clearview AI. Yet most cloud transcription services operate as if BIPA doesn't apply to them—a legal position that is increasingly untenable as courts expand the scope of biometric privacy protection.

For a broader look at the legal landscape surrounding AI meeting recording, see our article on consent laws for AI notetakers in 2026.

New State Laws Emerging

Texas, Washington, and Colorado have all enacted biometric privacy laws, with more states following. The patchwork of regulations means that a single Zoom meeting with participants across multiple states could trigger compliance obligations under several different biometric privacy statutes simultaneously.

The Deepfake Connection: From Voiceprints to Voice Clones

Perhaps the most alarming risk of cloud-stored voiceprints is their utility for voice cloning. Modern AI voice synthesis can create convincing replicas of a person's voice with as little as three seconds of sample audio. Cloud transcription services store hours upon hours of clean, labeled audio for each speaker.

A report by The Verge documented cases of AI voice cloning being used for corporate fraud—impersonating executives to authorize wire transfers. The raw material for these attacks? Clean audio samples, exactly like those stored by cloud transcription platforms.

Consider the attack surface:

  1. A CEO uses Otter.ai for six months of executive meetings
  2. A data breach exposes the audio database (or a malicious insider accesses it)
  3. Attackers now have hours of the CEO's voice—enough to clone it perfectly
  4. They use the cloned voice to call the CFO and authorize a fraudulent transfer

This isn't theoretical. It's happening. And cloud-stored voiceprints make it exponentially easier.

The Consent Problem: Who Actually Agreed?

Here's a critical point that most people miss: in a multi-person meeting, the person who activated the transcription bot may have "agreed" to the terms of service. But what about everyone else on the call?

When a meeting host activates Otter, Fireflies, or Zoom AI Companion, every participant's voice is processed for speaker diarization. Their voiceprints are extracted and stored. But those participants may have never:

A brief notification that a "bot" has joined the meeting does not constitute informed consent for biometric data processing under any serious privacy framework. Yet this is the standard practice across the industry.

How On-Device Speaker Diarization Eliminates the Risk

On-device speaker diarization solves this problem at the architectural level. When speaker identification runs entirely on your device:

Basil AI performs all speaker diarization using Apple's on-device Speech framework, leveraging the Apple Neural Engine built into every modern iPhone and Mac. Voice embeddings are computed on the device's secure hardware and never transmitted anywhere.

🔒 How Basil AI handles speaker identification:

What You Can Do Right Now

If you're concerned about voiceprint privacy in your meetings, take these steps immediately:

1. Audit Your Current Transcription Tools

Check the privacy policies of every transcription service you use. Search for terms like "biometric," "voice data," "speaker identification," and "voice embedding." Understand what's being stored and for how long.

2. Request Your Biometric Data

Under GDPR, CCPA, and state biometric privacy laws, you have the right to request a copy of all biometric data a company holds about you—and the right to demand its deletion. Exercise these rights.

3. Object to Bot Recording

When a transcription bot joins your meeting, you have every right to object. Ask the host to disable it, or state on the record that you do not consent to biometric data collection.

4. Switch to On-Device Processing

The only way to completely eliminate voiceprint privacy risk is to ensure voice data never leaves the device. Tools like Basil AI that process everything locally make speaker diarization safe by design.

5. Advocate for Organizational Policies

Push your organization to adopt meeting transcription policies that address biometric data. Require privacy impact assessments for any tool that performs speaker identification.

The Future of Voice Biometrics Regulation

The regulatory landscape is shifting rapidly. The EU AI Act now classifies biometric identification systems as "high-risk," requiring rigorous compliance documentation. Several US states are considering BIPA-style laws. And class action attorneys are increasingly targeting companies that collect voice biometrics without proper consent.

Organizations that continue to use cloud-based speaker diarization without explicit biometric consent are building significant legal liability. The smart move—both ethically and legally—is to adopt on-device solutions that eliminate the problem entirely.

Keep Your Voice Private

Basil AI performs speaker diarization 100% on-device. No voiceprint databases. No cloud processing. No biometric risk. Your voice stays yours.

Voice Biometrics Speaker Diarization Privacy On-Device AI