Running an AI model on a device requires more than just having the model file available. The device needs a software framework — a set of tools — that knows how to load the model, distribute the calculations across the available hardware, and return results quickly and efficiently. MLX is an open-source framework developed by Apple specifically for this purpose. It is designed to run AI models on Apple Silicon — Apple’s family of in-house chips, including the M-series in Macs and iPads and the A-series in iPhones — as efficiently as possible.Documentation Index
Fetch the complete documentation index at: https://isaree-cd4b6397.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
What makes Apple Silicon different
Conventional computers keep the processor (CPU) and memory (RAM) as separate components connected by a data bus. Every time the processor needs to access data in memory, it travels across that connection — which introduces latency and limits throughput. Apple Silicon uses a different design called unified memory architecture. The processor, the graphics chip, and the memory are all integrated onto a single piece of silicon. They share the same memory pool and can access it at very high speed, with very low latency. This architecture is particularly well-suited to AI workloads, which involve reading enormous numbers of model parameters from memory and performing calculations on them in rapid succession. MLX is built to take full advantage of this design, allowing AI models to run faster and more efficiently on Apple devices than would be possible on conventional hardware of equivalent size.MLX and Isa
When you run Isa on an iPhone, iPad, or Mac with Apple Silicon, MLX is the underlying framework that powers on-device transcription, on-device LLM inference, and on-device model personalization. The AI processing — transcribing your dictation, reasoning about a clinical note, drafting a letter — runs locally on your device. No cloud connection is required for these operations. When you personalize your own ASR model using Train your own medical voice AI, both the training and the inference run through MLX on your own machine. This means you can build and run a model trained on your own clinical vocabulary, in your own language and accent, entirely within your own hardware environment.Why it matters for clinicians
- Fast, local processing: MLX takes full advantage of Apple Silicon’s unified memory architecture, delivering low-latency AI responses directly on your device without waiting for a network round-trip.
- No specialist hardware required: You do not need a server, a GPU workstation, or a cloud subscription to run capable AI models. A modern iPhone or MacBook is sufficient.
- Enables personalization: Because MLX supports efficient on-device training as well as inference, you can finetune your own clinical models on your own hardware — keeping your data entirely within your control throughout the process.
Next
Quantization
See how models are compressed to fit on a phone.
On-device vs. cloud
Understand the trade-offs of running models on-device.
Train medical voice AI
Train your own model on Apple Silicon.

