General-purpose voice recognition (ASR) models struggle with medical terminology. Words like “Dermatofibrosarkoma” or “Efalizumab” often become gibberish when spoken. Fixing this usually requires a massive dataset of recorded medical dictations, which most clinicians do not have. In this tutorial, you will learn how to generate your own synthetic medical dataset and train a small, highly accurate voice model on it. Every step runs locally on your Mac — no cloud dependency, no API costs, and complete patient data privacy.Documentation Index
Fetch the complete documentation index at: https://isaree-cd4b6397.mintlify.app/llms.txt
Use this file to discover all available pages before exploring further.
What You Need Before You Start
This tutorial is designed for Apple Mac computers. You will need:- A Mac with an Apple Silicon chip (M1, M2, M3, or M4).
- macOS 13.5 or newer.
- About 40 GB of free disk space to store the models and generated audio.
Step 1: Install the Necessary Tools
We need to install a few tools to run the tutorial. If you have never used the “Terminal” before, don’t worry — it’s just a place to type commands.- Open the Terminal app on your Mac (press
Cmd + Space, type “Terminal”, and hit Enter). - First, we need Ollama, an app that lets your Mac run AI models locally. Download and install it from ollama.com.
- Once Ollama is installed, go back to your Terminal and type this command, then press Enter:
This downloads the AI model that will write our medical sentences. It is a large file (~22 GB), so this might take a while depending on your internet connection.
- Next, we need uv, a tool that manages Python code. Paste this command into the Terminal and press Enter:
Step 2: Download the Tutorial Files
We have prepared all the code for you in a “repository” (a folder of code) on GitHub. You just need to download it to your Mac.- In your Terminal, type this command and press Enter to download the folder:
- Now, move into the folder you just downloaded by typing:
- Finally, install all the required Python packages by typing:
Step 3: Open the Tutorial Notebook
We use something called a “Jupyter Notebook” to run the code. It lets you run small blocks of code one at a time and see the results immediately.- Make sure Ollama is running in the background. Open a new Terminal window and type:
Leave this window open.
- Go back to your first Terminal window (which should still be in the
tutorials/asr-tutorialfolder) and type: - A web page will automatically open in your browser showing the tutorial code.
Step 4: Run the Pipeline
The notebook is divided into 6 stages. To run a block of code (called a “cell”), click on it and pressShift + Enter.
Here is exactly what happens at each stage:
Stage 1: Generate Medical Text
The first cell uses Ollama to write realistic German dermatology sentences (e.g., “The patient presents with an erythematous plaque”). It automatically rejects sentences that contain abbreviations, ensuring the text is perfect for voice training. Click the cell and pressShift + Enter to generate 50 test sentences.
Stage 2: Synthesize Audio
The next cell takes those written sentences and turns them into spoken audio using a Text-to-Speech (TTS) model. It also creates “noisy” and “sped up” versions of the audio to help the model learn to understand different speaking conditions. Click the cell and pressShift + Enter.
Stage 3: Package the Dataset
This quick step sorts your generated audio into three piles: Training data (to teach the model), Validation data (to check its progress), and Test data (to grade its final performance). Click the cell and pressShift + Enter.
Stage 4: Finetune the Model
This is the core step. Your Mac will now teach a base voice model (Qwen3-ASR) to understand the medical words you generated. It does this by creating a small “adapter” that sits on top of the base model.
Click the cell and press Shift + Enter. This will take a few minutes.
Stage 5: Evaluate
Once training is done, this cell tests the new model. It compares the “Word Error Rate” (WER) of the original model against your newly trained model. Lower numbers are better! Click the cell and pressShift + Enter.
Stage 6: Try It Yourself
Now for the fun part. You can record your own voice saying a medical sentence, save it asmy_recording.wav in the same folder, and the notebook will transcribe it using your custom model.
Adapting to Your Own Specialty
The tutorial defaults to German Dermatology, but you can change it to any specialty (like Cardiology or Neurology). To do this, you just need to edit the files before running Stage 1:- Open the file
asr/taxonomy.jsonand replace the skin conditions with conditions from your specialty. - In the Stage 1 cell of the notebook, change the
vocabularylist to include terms from your field (e.g., “Auskultation”, “Myokardinfarkt”). - Change the
specialtyparameter to match your field.
What’s Next?
You have just trained a medical AI model on your own computer! The default tutorial only runs 50 samples to show you how it works. To build a production-ready model, you simply increase then_samples number in Stage 1 to generate thousands of sentences, and let your Mac run overnight.
Want to run your new finetuned model on your phone for your clinical workflow? Visit Isaree.ai to learn how to deploy it securely.
