The Primary Agent is the agent you talk to in Patient Chat — every Patient Chat is a conversation between you and the Primary Agent, about the patient. The Primary Agent interprets what you say, calls Agents and MCP Servers, and produces the response. You have to pick a Primary Agent before you can open a Patient Chat. The Primary Agent does not operate in Workspace — Workspace lets you chat directly with an Agent. This page covers two things: the model behind the Primary Agent, and its system prompt.Documentation Index
Fetch the complete documentation index at: https://isaree-cd4b6397.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
Open the Primary Agent Model picker
Open Settings in Isa, then open Primary Agent Model. The picker has three parts:- IN USE at the top — the active variant, with size, capabilities, and where it runs (on-device or cloud).
- Thinking Mode toggle — for models that support step-by-step reasoning.
- Model Families — grouped by maker (Qwen 3.5, Liquid 2.5, Gemma 3, Llama 3.2, Qwen 3) for on-device, plus OpenAI and Aki.io for cloud. A green dot marks the family the active variant comes from.
Pick on-device or cloud
The first decision is where the model runs:- On-device — the model runs on your iPhone or iPad. What the Primary Agent processes — the chat, transcripts, OCR text — never leaves your device. Restricted to models small enough to fit in RAM.
- Cloud — the model runs via OpenAI or Aki.io. The conversation leaves your device when it does. You bring your own API key.
Pick an on-device Primary Agent
The on-device families are pre-vetted lists of open-weight models that run on Apple hardware. Open a family to see its variants. Each variant lists size in GB, capabilities (Vision, Thinking), and the iPhone it’s tested on. Solid defaults for clinical use:Qwen 3.5 (4B)— careful and thorough for longer histories, nuanced summaries, and busy documents. Best on iPhone 16 Pro / 17 Pro.Qwen 3.5 (2B)— everyday default for visit summaries, referrals, and lab-photo extraction. Best on iPhone 15 Pro / 16 / 17.LFM 2.5 Thinking (1.2B)(Liquid 2.5 family) — small reasoning model for tool-heavy workflows when you need to leave room for Agents.
Pick a cloud Primary Agent
OpenAI and Aki.io are families inside the picker. Open one and you’ll see:- API key at the top — paste your key and save it. The key stays on your device. Once saved, the family shows an “API key saved” badge with the last four characters.
- Variants below — pick one to make it active. There’s nothing to download.
- OpenAI — OpenAI’s proprietary models.
- Aki.io — hosts open-weight models (Llama 3.3 70B, Qwen 3.6 35B, Gemma 4 26B, GPT-OSS 120B, MiniMax M2.5 230B) on EU infrastructure and uses the same API format as OpenAI. Cheaper than OpenAI.
Switch between models
To change the active Primary Agent, open the picker and pick any other variant — on-device or cloud, in any family. The new variant becomes active immediately and the IN USE panel updates. API keys and downloaded models stay in place, so switching back is a single tap.Turn on Thinking Mode
The Thinking Mode toggle at the top of the picker lets compatible models reason step-by-step before answering. Models that support it carry a Thinking badge. Thinking Mode improves accuracy on complex questions but slows responses. It can also reduce reliability when the Primary Agent is calling MCP Servers or other Agents — turn it off if you’re running tool-heavy workflows and seeing flaky behavior. A handful of variants (likeQwen 3 Thinking (4B)) are always-thinking regardless of the toggle.
Edit the system prompt
The system prompt tells the Primary Agent how to behave in Patient Chat — its tone, and what it should and shouldn’t do. Open it from Settings → System Prompt. An Agent’s system prompt and the Primary Agent’s system prompt do different jobs. An Agent’s prompt is narrow — it describes the one task that Agent is built for (summarise a visit, extract labs, draft a referral). The Primary Agent’s prompt is broader — it sets the tone, ground rules, and response style across every Patient Chat.Know what else this affects
The model you pick has consequences elsewhere in Isa:- Where the Primary Agent’s data goes. An on-device variant keeps the chat, OCR text, and Scribe transcripts the Primary Agent sees on the device. A cloud variant sends them to the provider. This setting only governs the Primary Agent — cloud Scribe Agents and external MCP Servers still send their own data off-device regardless of which Primary Agent you pick.
- Camera is VLM-only. The Camera button in Patient Chat only appears when the active variant is a VLM (carries the Vision badge). OCR via Scan Doc works either way.
- Scribe Agent extraction. When the Extraction provider in Settings is set to On-Device, it uses the Primary Agent’s loaded model. If the Primary Agent’s model is cloud, extraction also goes off-device — independently of the Scribe Agent’s own Transcription setting.
Next
Choose a model
Size the Primary Agent for the device — parameter count, quantization, RAM headroom.
Data and privacy
What stays on-device versus what leaves it.
Patient Chat
The main surface where the Primary Agent does its work.

