Primary Agent - Isaree Docs

The Primary Agent is the agent you talk to in Patient Chat — every Patient Chat is a conversation between you and the Primary Agent, about the patient. The Primary Agent interprets what you say, calls Agents and MCP Servers, and produces the response. You have to pick a Primary Agent before you can open a Patient Chat. The Primary Agent does not operate in Workspace — Workspace lets you chat directly with an Agent. This page covers two things: the model behind the Primary Agent, and its system prompt.

Open the Primary Agent Model picker

Open Settings in Isa, then open Primary Agent Model. The picker has three parts:

IN USE at the top — the active variant, with size, capabilities, and where it runs (on-device or cloud).
Thinking Mode toggle — for models that support step-by-step reasoning.
Model Families — grouped by maker (Qwen 3.5, Liquid 2.5, Gemma 3, Llama 3.2, Qwen 3) for on-device, plus OpenAI and Aki.io for cloud. A green dot marks the family the active variant comes from.

Pick on-device or cloud

The first decision is where the model runs:

On-device — the model runs on your iPhone, iPad, or Mac. What the Primary Agent processes — the chat, transcripts, OCR text — never leaves your device. Restricted to models small enough to fit in RAM.
Cloud — the model runs via OpenAI or Aki.io. The conversation leaves your device when it does. You bring your own API key.

See On-device vs. cloud for the deeper trade-off, and Data and privacy for what data flows where.

The Primary Agent and any Agent you invoke share your device’s RAM. Pick a Primary Agent small enough to leave room for the Agents you actually use.

Pick an on-device Primary Agent

The on-device families are pre-vetted lists of open-weight models that run on Apple hardware. Open a family to see its variants. Each variant lists size in GB, capabilities (Vision, Thinking), and the iPhone it’s tested on. Solid defaults for clinical use:

Qwen 3.5 (4B) — careful and thorough for longer histories, nuanced summaries, and busy documents. Best on iPhone 16 Pro / 17 Pro.
Qwen 3.5 (2B) — everyday default for visit summaries, referrals, and lab-photo extraction. Best on iPhone 15 Pro / 16 / 17.
LFM 2.5 Thinking (1.2B) (Liquid 2.5 family) — small reasoning model for tool-heavy workflows when you need to leave room for Agents.

Download a variant to install it. Once it’s installed, select it to make it active. Variants you’ve downloaded show Delete instead. For deeper sizing guidance, see Choose a model.

Pick a cloud Primary Agent

OpenAI and Aki.io are families inside the picker. Open one and you’ll see:

API key at the top — paste your key and save it. The key stays on your device. Once saved, the family shows an “API key saved” badge with the last four characters.
Variants below — pick one to make it active. There’s nothing to download.

Two cloud providers:

OpenAI — OpenAI’s proprietary models.
Aki.io — hosts open-weight models (Llama 3.3 70B, Qwen 3.6 35B, Gemma 4 26B, GPT-OSS 120B, MiniMax M2.5 230B) on EU infrastructure and uses the same API format as OpenAI. Cheaper than OpenAI.

Switch between models

To change the active Primary Agent, open the picker and pick any other variant — on-device or cloud, in any family. The new variant becomes active immediately and the IN USE panel updates. API keys and downloaded models stay in place, so switching back is a single tap.

Turn on Thinking Mode

The Thinking Mode toggle at the top of the picker lets compatible models reason step-by-step before answering. Models that support it carry a Thinking badge. Thinking Mode improves accuracy on complex questions but slows responses. It can also reduce reliability when the Primary Agent is calling MCP Servers or other Agents — turn it off if you’re running tool-heavy workflows and seeing flaky behavior. A handful of variants (like Qwen 3 Thinking (4B)) are always-thinking regardless of the toggle.

Edit the system prompt

The system prompt tells the Primary Agent how to behave in Patient Chat — its tone, and what it should and shouldn’t do. Open it from Settings → System Prompt. An Agent’s system prompt and the Primary Agent’s system prompt do different jobs. An Agent’s prompt is narrow — it describes the one task that Agent is built for (summarise a visit, extract labs, draft a referral). The Primary Agent’s prompt is broader — it sets the tone, ground rules, and response style across every Patient Chat.

Changing the system prompt changes how the Primary Agent behaves in every Patient Chat conversation. Edit carefully — if you’re unsure, ask in Discord.

Know what else this affects

The model you pick has consequences elsewhere in Isa:

Where the Primary Agent’s data goes. An on-device variant keeps the chat, OCR text, and Scribe transcripts the Primary Agent sees on the device. A cloud variant sends them to the provider. This setting only governs the Primary Agent — cloud Scribe Agents and external MCP Servers still send their own data off-device regardless of which Primary Agent you pick.
Camera is VLM-only. The Camera button in Patient Chat only appears when the active variant is a VLM (carries the Vision badge). OCR via Scan Doc works either way.

Choose a model

Size the Primary Agent for the device — parameter count, quantization, RAM headroom.

Data and privacy

What stays on-device versus what leaves it.

Patient Chat

The main surface where the Primary Agent does its work.

​Open the Primary Agent Model picker

​Pick on-device or cloud

​Pick an on-device Primary Agent

​Pick a cloud Primary Agent

​Switch between models

​Turn on Thinking Mode

​Edit the system prompt

​Know what else this affects

​Next

Choose a model

Data and privacy

Patient Chat

Open the Primary Agent Model picker

Pick on-device or cloud

Pick an on-device Primary Agent

Pick a cloud Primary Agent

Switch between models

Turn on Thinking Mode

Edit the system prompt

Know what else this affects

Next