ASR models - Isaree Docs

ASR stands for Automatic Speech Recognition. It is the specific type of AI model responsible for listening to audio and transcribing exactly what was said. Whenever you use voice-to-text on your phone, you are using an ASR model. However, standard ASR models (like Siri or Alexa) are trained on everyday conversation. They often fail completely when they hear complex medical terminology, drug names, or rapid clinical dictation. In healthcare, specialized medical ASR models are trained specifically to understand clinical vocabulary.

How it works in practice

You are in the pathology lab, wearing gloves, and need to dictate a gross examination of a tissue sample. You speak rapidly: “Specimen is a skin excision measuring 2.4 by 1.2 centimeters, showing an irregular, hyperpigmented macule concerning for superficial spreading melanoma.” A standard, consumer-grade ASR model might transcribe “macule” as “Mac tool” or “melanoma” as “melon aroma.” A medical ASR model trained on clinical vocabulary handles these terms correctly. Isa supports two transcription modes: on-device, where audio is processed entirely on your device and never sent anywhere, and cloud (via ElevenLabs), where audio is sent to an external server in exchange for higher accuracy. You choose which to use in the recording flow when you start a Scribe session — see Scribe in Isa.

Personalizing your ASR model

Because every clinical specialty has its own vocabulary — and every clinician has their own speaking style, accent, and preferred terminology — Isaree recommends that you personalize your ASR model rather than relying on a generic base model. A personalized model is trained on examples of your own clinical dictations, making it significantly more accurate for your specialty and speaking style. Isaree provides a step-by-step guide to building and finetuning your own medical ASR model. See Train your own medical voice AI for a full walkthrough.

Why it matters for clinicians

Medical accuracy: A medical ASR model understands the difference between “hyperthyroidism” and “hypothyroidism,” preventing dangerous transcription errors in patient records.
Saves time: You don’t have to waste time going back and manually correcting bizarre typos in your dictations.
Enables hands-free work: Accurate transcription lets you document while examining patients or performing procedures, without stopping to type.

Scribe

See how ASR fits into the full transcription and extraction pipeline.

Diarization models

Understand how Isa identifies who is speaking in a recording.

Train your own medical voice AI

Build and fine-tune an ASR model on your own clinical dictations.

Scribe Diarization models

​How it works in practice

​Personalizing your ASR model

​Why it matters for clinicians

​Next