June 29, 2026 · 11 min read

How to Run Apple Intelligence's Neural Engine for Meeting Transcription (Without a Subscription)

Key takeaways
  • Apple's Neural Engine has scaled from 0.6 TOPS in 2017 to roughly 133 TOPS on M5 — enough to run meeting transcription and summarization entirely on-device.
  • iOS 26's new SpeechAnalyzer API is on-device-only and tuned for long-form audio like lectures and meetings, replacing the older one-minute-limited SFSpeechRecognizer.
  • Apple Foundation Models 3, released June 8 2026, exposes a 3B-parameter on-device LLM to third-party apps via Swift — no API fees, no quotas, no cloud round-trip.
  • Cloud transcription services charge per minute and retain audio; on-device processing has zero variable cost and leaves nothing on a vendor's servers.
  • Basil AI uses these system frameworks so every word of your meeting stays on your iPhone or Mac, even in airplane mode.

Quick answer: Apple's Neural Engine — paired with the iOS 26 SpeechAnalyzer API and on-device Apple Foundation Models — can transcribe and summarize meetings entirely on your iPhone or Mac with no cloud subscription. You need an A12 Bionic or newer iPhone (iPhone XS+), or any Apple Silicon Mac. Apps like Basil AI tap these system frameworks to deliver real-time transcription that never leaves your device.

Every modern iPhone and Mac contains a neural processing unit powerful enough to transcribe and summarize an 8-hour meeting without ever touching the cloud. Here is how the stack works, what hardware you need, and why subscription transcription services are increasingly hard to justify.

If you have paid for Otter, Fireflies, or a Zoom AI Companion seat in the last twelve months, you have effectively rented compute you already own. The Neural Engine inside your iPhone and Mac — the same silicon Apple uses to power Face ID, Siri, and Apple Intelligence — is fully capable of running real-time meeting transcription and language-model summarization on-device. With the SpeechAnalyzer API that shipped in iOS 26 and the third generation of Apple Foundation Models announced at WWDC 2026, that capability is now exposed to any third-party app through a public Swift framework, with no per-minute fees and no audio leaving the device.

This guide is a technical walkthrough — aimed at professionals choosing tools, not just engineers shipping code — of how Apple's on-device AI stack actually works for meetings, what hardware you need, where the limits are, and how an app like Basil AI uses these frameworks to deliver transcription and summarization without a subscription tier sitting between you and your own audio.

The Neural Engine: from Face ID to on-device LLMs in seven years

Apple's Neural Engine (ANE) is a dedicated neural processing unit that has shipped in every A-series and M-series chip since 2017. Wikipedia's Neural Engine entry notes that it was first introduced with the A11 Bionic in the iPhone 8, iPhone 8 Plus, and iPhone X, and that Apple services including Siri, Face ID, and Apple Intelligence run on the Neural Engine on-device, which the company cites as a privacy benefit.

The trajectory is striking. According to PatSnap's Apple Silicon roadmap analysis, the Neural Engine grew from 0.6 TOPS in the A11 Bionic in 2017 to 38 TOPS in the M4 in 2024 — a 63× improvement in seven years that enabled the chip to run transformer-based large language models entirely on-device without cloud connectivity. Apple's M5 announcement pushed that further, claiming over 4× peak GPU compute for AI compared to M4 by adding a Neural Accelerator to each GPU core, with 153 GB/s of unified memory bandwidth to feed it.

What does that mean in practical terms? A two-hour board meeting is roughly 20–25 MB of compressed audio. Running that through a 3-billion-parameter speech model and a 3-billion-parameter summarization model on a chip rated for tens of trillions of operations per second is, frankly, trivial. The cloud round-trip exists for billing reasons, not technical ones.

Which devices qualify

Apple's WWDC 2019 "Advances in Speech Recognition" session established the floor: all iPhones and iPads with Apple A9 or later processors are supported, and all Mac devices are supported. For the new iOS 26 SpeechAnalyzer API and Apple Intelligence features, the floor is higher — you generally need an iPhone 15 Pro or later, or any Apple Silicon Mac.

SpeechAnalyzer: the on-device API built for meetings

For nearly a decade, the workhorse for on-device transcription on iOS was SFSpeechRecognizer, an object used to check availability of and initiate speech recognition. It worked, but it had real constraints. As Picovoice's 2026 iOS speech recognition guide documents, SFSpeechRecognizer imposed a hard one-minute limit per recognition session and a rate limit of 1,000 requests per device per hour — useful for short dictation, painful for hour-long meetings.

iOS 26 changed that. Apple's WWDC 2025 SpeechAnalyzer session introduced a new model that is both faster and more flexible than the one previously available through SFSpeechRecognizer, and is good for long-form and distant audio such as lectures, meetings, and conversations. Apple itself uses the new model in Notes, Voice Memos, Journal, and the Call Summarization feature in Phone.

How the API is structured

The architecture is modular. As Callstack's SpeechAnalyzer explainer describes, the SpeechAnalyzer class coordinates the process by managing attached modules, receiving incoming audio, and controlling the analysis workflow. In iOS 26, two modules are public: SpeechTranscriber for speech-to-text, and SpeechDetector for voice activity detection. Modules can be added or removed dynamically during a session.

The model itself is downloaded via Apple's AssetInventory API, retained in system storage, and shared across apps — so it does not increase the download or storage size of your application or its run-time memory. And crucially, as engineer Blake Crosley's SpeechAnalyzer vs SFSpeechRecognizer breakdown confirms, SpeechAnalyzer is an on-device-only framework with no server-side path — the framework's value is precisely the on-device privacy and zero-cost-per-call story.

Accuracy: how it compares to Whisper

Independent benchmarks back up Apple's claims. Argmax's WhisperKit benchmark report found that Apple's SpeechAnalyzer matches the speed and accuracy of mid-tier OpenAI Whisper models on long-form conversational speech transcription. Crosley's breakdown adds that the new proprietary Apple model is reportedly 2× faster than Whisper Large V3 Turbo on equivalent transcription tasks.

The honest caveat: Argmax also notes that SpeechAnalyzer currently lacks the Custom Vocabulary feature that the older SFSpeechRecognizer supports, which still matters for domain-specific accuracy (legal terms, drug names, proprietary product SKUs). A hybrid approach is possible today, and Apple is expected to close that gap.

Apple Foundation Models 3: the on-device LLM, now exposed to apps

Transcription is half the problem. The other half is summarization, action-item extraction, and structured note generation — the part most apps need a cloud LLM for. That changed at WWDC 2026.

Apple Machine Learning Research's AFM 3 announcement introduced, on June 8, 2026, the third generation of Apple Foundation Models — a family of five models with a privacy-first architecture. Two of those models run on-device: AFM 3 Core, the next generation of Apple's 3-billion-parameter dense model, and AFM 3 Core Advanced, a more powerful on-device model that is natively multimodal and enables higher-accuracy dictation. Apple's June 2026 newsroom post on Apple Intelligence confirms these latest models run on device and on servers using Private Cloud Compute, and that every facet of the new Apple Intelligence architecture is built privacy-first.

For developers, the unlock is the Foundation Models Framework, a native Swift API that gives direct access to the same on-device model that powers Apple Intelligence. Apps can now work with Apple Foundation Models, cloud models like Claude and Gemini, or any other provider that conforms to a common Language Model protocol — but the on-device path is what matters for privacy-sensitive workflows.

What you can do with it

Apple's earlier 2025 Foundation Models update describes the use cases the on-device model excels at: summarization, entity extraction, text understanding, refinement, short dialog, and generating creative content. That maps almost perfectly to what you want from meeting notes — a TL;DR summary, a list of attendees and decisions, an action-item extraction pass, and a rewrite that turns rough fragments into a polished memo.

What this actually costs (hint: $0 per minute)

The economic story is where on-device transcription becomes hard to argue with. Here is what the major options cost you today, per hour of audio:

Service Where audio is processed Cost per hour of audio Audio retained? Works offline?
Apple SpeechAnalyzer + AFM 3 (on-device) Your iPhone or Mac $0 (one-time app cost) Only if you save it Yes
OpenAI Whisper API OpenAI servers ~$0.36 ($0.006/min) Per OpenAI policy No
AWS Transcribe AWS servers ~$1.44 ($0.024/min) Per AWS policy No
Otter Business Otter cloud ~$30/user/month flat Yes — indefinitely by default No
Zoom AI Companion Zoom cloud Bundled in Zoom seat Per Zoom policy No

Multiply that by a sales team taking five calls a day, or a consulting firm running back-to-back client meetings, and the variable cost of cloud transcription compounds quickly. The on-device path has no variable cost at all — you paid for the Neural Engine when you bought the iPhone.

The compliance angle: why on-device matters even when it's not the cheapest path

Cost is one reason. Compliance is the other, and for regulated industries it's the more important one. Picovoice's iOS speech recognition guide is blunt about this: for apps subject to HIPAA, GDPR, or CCPA, sending audio to external servers creates compliance risk, while on-device processing sidesteps it because audio stays local, latency stays predictable, and the app keeps working when the network does not.

Under Article 5 of the GDPR, processing must follow data-minimization and purpose-limitation principles. When meeting audio never leaves the recording device, both principles are satisfied by architecture rather than by promise. Similarly, the HHS HIPAA Privacy Rule protects individually identifiable health information — an obligation that gets much easier to meet when PHI in a recorded clinical conversation is never transmitted to a third-party vendor in the first place.

That logic also applies to the privacy policies of the cloud incumbents. Otter.ai's privacy policy describes broad rights to retain and use uploaded audio. Fireflies' privacy policy and Zoom's privacy statement similarly contemplate cloud retention and downstream processing. None of those concerns apply when the audio never leaves the device.

Cloud vs on-device meeting transcription, side by side

Capability Cloud (Otter, Fireflies, Zoom AI) On-device (Apple Neural Engine + Basil AI)
Processing locationVendor serversYour iPhone or Mac
Audio retentionDays to indefinite per ToSNone unless you save it
Used for model trainingOften, with opt-outNever
Works offline / in airplane modeNoYes
Per-minute cost$0.006–$0.024+$0
LatencyNetwork + inferenceInference only
HIPAA / GDPR postureBAA + DPA requiredNo third-party data flow
Voice bot visible to meeting?Often yesNo

Hardware requirements, decoded

This is the part most readers ask about: "Do I need to buy a new device?" Almost certainly not. Here is the practical breakdown.

iPhone

For SFSpeechRecognizer on-device transcription — the older API — you need A9 or newer (iPhone 6s and up). For SpeechAnalyzer and the new long-form model in iOS 26, the API itself is available on most iPhones running iOS 26. For full Apple Intelligence features including AFM 3 Core Advanced, you need iPhone 15 Pro or newer, which has the memory and Neural Engine throughput required.

Mac

Any Apple Silicon Mac — M1 or later — will run on-device transcription comfortably. The original M1 announcement noted that M1's 16-core Neural Engine is capable of 11 trillion operations per second, enabling up to 15× faster machine learning performance. That was 2020. An M4 hits roughly 38 TOPS and M5 generation chips push neural compute substantially further.

iPad

iPad Pro with M-series silicon is the strongest tablet for this workload. iPad Air and base iPad models with A14 or newer also run on-device transcription well.

How Basil AI uses this stack

Basil AI is built directly on top of Apple's on-device frameworks. The Speech framework handles real-time transcription through Apple's Neural Engine; the Foundation Models framework handles summarization, action-item extraction, and note generation against AFM 3 Core. The audio buffer never leaves the device, no recording is uploaded to any server, and there is no "Basil cloud" sitting between you and your transcript.

That architectural choice has three practical consequences:

  1. It works offline. Airplane mode, secure facilities, international flights, a basement conference room with no signal — all fine. The model is on the device.
  2. It has no per-minute pricing pressure. Because there is no cloud inference bill on the back end, Basil can offer 8-hour continuous recording without metering minutes.
  3. It is structurally compatible with regulated workflows. Attorneys, clinicians, and compliance officers don't have to evaluate a third-party vendor's data-handling because there is no third party in the data path.

If you want a deeper look at the architecture, see our walkthrough of how Basil processes audio locally, and our comparison of Granola vs Basil on bot-free vs on-device architecture. For a more legal-industry-specific take, see our buyer's guide for lawyers.

Limits and honest tradeoffs

On-device is the right default, but it isn't magic. Three things to be clear-eyed about:

None of these is a reason to ship your audio to a cloud vendor by default. They are reasons to pick the right tool for the right meeting.

The bigger picture: edge AI is no longer a downgrade

For most of the last decade, "on-device" meant "less capable." That trade is gone. Apple's third-generation Foundation Models, the Neural Engine's roughly 220× growth in compute from A11 to M5, and a public Swift framework for any developer to plug into all add up to a stack where running AI on your own hardware is genuinely competitive on quality — and decisively better on privacy, latency, offline support, and cost.

The subscription-economics of cloud transcription only made sense when on-device wasn't an option. It is now.

Try fully on-device meeting transcription

Basil AI uses Apple's Speech framework and Foundation Models to keep every word of your meeting on your iPhone or Mac. 8-hour recording, real-time transcripts, smart summaries — no cloud, no subscription per minute.

Download on the App Store Download on the Mac App Store

Frequently Asked Questions

What hardware do I need to run Apple Neural Engine transcription?

Any iPhone with an A12 Bionic chip or newer (iPhone XS, 2018 and later) and any Apple Silicon Mac (M1 or later) includes a Neural Engine capable of on-device transcription. For the newest SpeechAnalyzer API and Apple Intelligence features, you'll need iOS 26 or macOS 26, which require an iPhone 15 Pro or newer for full Apple Intelligence support.

Is Apple's on-device transcription as accurate as Whisper or Otter?

Independent benchmarks by Argmax show Apple's SpeechAnalyzer model matches the speed and accuracy of mid-tier OpenAI Whisper models on long-form conversational speech, and is reportedly about 2× faster than Whisper Large V3 Turbo. Apple's older SFSpeechRecognizer with custom vocabulary still outperforms the new API on domain-specific keywords, so accuracy depends on use case.

Do I need an internet connection for Neural Engine transcription?

No. The SpeechAnalyzer and SpeechTranscriber APIs introduced in iOS 26 run entirely on-device, with language models downloaded once via Apple's AssetInventory system. SFSpeechRecognizer also supports a `requiresOnDeviceRecognition` flag. Once language assets are installed, transcription works in airplane mode, on planes, in secure facilities, or anywhere without connectivity.

Does Apple charge per minute for on-device transcription like cloud APIs do?

No. Apple's Speech and Foundation Models frameworks have no per-minute fees, no API quotas, and no monthly subscription. Apps using them pay zero variable cost. Compare that to OpenAI Whisper API at $0.006 per minute, Otter Business at $30/user/month, or AWS Transcribe at roughly $0.024 per minute — costs that compound with every hour of meetings.

Can third-party apps use the same Neural Engine that powers Apple Intelligence?

Yes. Apple's Foundation Models framework, expanded at WWDC 2026, gives developers direct Swift API access to the same on-device AFM 3 Core model that powers Apple Intelligence. Combined with SpeechAnalyzer for transcription, apps like Basil AI can summarize meetings, extract action items, and generate notes without sending audio to any external server.

What's the difference between Apple Intelligence and Private Cloud Compute?

Apple Intelligence runs primarily on-device using AFM 3 Core models. When a request exceeds on-device capability, it can route to Private Cloud Compute — Apple's encrypted server tier where, per Apple, personal data is not stored or made accessible to Apple. For meeting transcription specifically, Basil AI keeps everything on-device and never touches Private Cloud Compute or any other server.