We build a medical documentation API at Twofold, so we evaluate this category the way an engineering team integrating one would — by reading the docs, signing the BAA, and running real clinical audio through it. This guide is the result of that lens. We include our own product, and we tell you exactly where it fits and, just as importantly, where it doesn't.
The first thing to understand is that "medical speech‑to‑text API" describes three very different products. Choosing the wrong tier is the most expensive mistake teams make in this space, because you either overpay for capabilities you don't need or you spend a year rebuilding the layer you should have bought.
A note on accuracy claims before we start: every vendor publishes a word error rate, and most quote impressive single‑digit figures. None of them are independent benchmarks. Throughout this guide we report vendor‑published numbers as exactly that — vendor claims — and we recommend you benchmark any shortlisted API on your own representative audio before committing.
How we evaluated these APIs
We ranked each option on the criteria that actually determine whether a healthcare product ships and stays maintainable, not on marketing accuracy figures alone:
Criterion | What we looked for | Why it matters |
|---|---|---|
Output tier | Raw transcript, medical-tuned transcript, or finished clinical note | Determines how much you build on top |
HIPAA / BAA | Is a BAA available, on which plans, with what retention terms | Non-negotiable for handling PHI |
Medical accuracy | Vocabulary coverage for drugs, labs, and clinical terms | General STT mis-hears medication and dosage terms |
Latency & mode | Streaming vs. batch, real-time vs. asynchronous | Live encounters need streaming; bulk jobs don't |
Developer experience | SDKs, docs quality, webhooks, sandbox access | Drives time-to-first-integration and maintenance cost |
Pricing model | Per-minute, per-request, or platform/seat licensing | Per-minute scales with usage; SDK/seat licensing doesn't |
The three tiers of medical speech-to-text
Before the ranking, anchor on these tiers. Every product below sits in one or two of them, and the right pick depends entirely on which tier your product needs.
Tier 1 — General speech-to-text
Fast, cheap, high‑quality transcription that isn't tuned for medicine. Excellent for general audio, but it will mis‑transcribe medication names, dosages, and clinical shorthand. Usable in healthcare for non‑clinical audio (scheduling, support) or as a base you fine‑tune yourself.
Tier 2 — Medical-tuned ASR
Transcription with models trained on clinical vocabulary — drug names, anatomy, lab terms, abbreviations. Returns an accurate medical transcript. You still build summarization, note structure, and any coding on top.
Tier 3 — Ambient clinical documentation API
Listens to a full encounter and returns a finished, structured clinical note — and often problems, medications, and codes — rather than a transcript. This tier removes the largest and riskiest part of building a scribe: turning conversation into a defensible clinical note.

Quick comparison
A high‑level map of the shortlist. Detailed write‑ups follow.
API | Tier | Primary output | BAA available | Best for |
|---|---|---|---|---|
Twofold | 3 | Finished clinical note + structured data | Yes | Products that need notes, not transcripts |
Deepgram Nova-3 Medical | 2 | Medical transcript (streaming + batch) | Yes | Low-latency medical ASR at scale |
AWS Transcribe Medical | 2 | Medical transcript | Yes | Teams standardized on AWS |
AWS HealthScribe | 3 | Note + transcript (batch) | Yes | AWS-native ambient documentation |
AssemblyAI | 2 | Transcript (medical mode) | Yes | Developer-friendly ASR + audio intelligence |
Corti | 2 / 3 | Transcript + clinical documentation | Yes | API-native ambient documentation |
Suki | 2 / 3 | Transcript + note (SDK) | Yes | Embedding a proven assistant via SDK |
Nabla | 3 | Note (white-label) | Yes | White-label ambient scribe in your app |
Google Cloud STT (medical models) | 2 | Medical transcript | Yes | Teams standardized on Google Cloud |
Nuance Dragon Medical SpeechKit | 2 | Medical transcript (SDK) | Yes | Deep clinical vocabulary via SDK |

1. Twofold — Clinical documentation API (Tier 3)
We rank our own clinical documentation API first for one specific reason: most teams shopping for a medical speech‑to‑text API actually want a note, not a transcript. Twofold takes encounter audio and returns a finished, specialty‑aware clinical note plus structured data — the summarization, formatting, and template layer is already built and clinically maintained.
That is also the honest boundary of this pick. If you only need a raw transcript to feed your own pipeline, a Tier 1/2 API will be cheaper and more flexible, and we'd point you there. Twofold earns the top spot when the alternative is building and maintaining the note‑generation layer yourself — a large, ongoing investment in templates, safety guardrails, and clinician‑editing UX.
- Best for: products that need to display or store a clinical note, especially in mental and behavioral health.
- Also available as a white-label partner program if you want the documentation experience inside your own product without building it.
- Watch-out: overkill if your use case genuinely ends at the transcript.
2. Deepgram Nova-3 Medical (Tier 2)
Deepgram is the strongest pure medical‑ASR pick for teams that want low latency and high throughput. Nova‑3 Medical is tuned for clinical vocabulary, supports both streaming and pre‑recorded modes, and has a clean, well‑documented API with usage‑based per‑minute pricing that scales predictably.
- Strengths: fast streaming, strong clinical vocabulary, transparent per-minute pricing, excellent docs.
- Watch-out: it's a transcription engine — you build summarization and note structure on top. Confirm BAA terms for your plan.
3. AWS Transcribe Medical + HealthScribe (Tier 2 + Tier 3)
If you're already on AWS, the platform gives you both tiers. Transcribe Medical is medical‑tuned ASR; HealthScribe is an ambient documentation service that returns a structured note plus a transcript with evidence linking. The integration story is excellent when your infrastructure already lives in AWS and you want a single vendor and BAA.
- Strengths: native AWS integration, one BAA across services, HealthScribe's evidence linking for clinician trust.
- Watch-out: HealthScribe has historically been batch-only with limited language and specialty coverage — verify current support against your use case before committing.
4. AssemblyAI (Tier 2)
AssemblyAI is one of the most developer‑friendly ASR platforms, with a medical mode, strong audio‑intelligence features, and a BAA. It's a good fit when you want excellent transcription plus building blocks (summarization, topic detection) without committing to a full clinical‑documentation product.
- Strengths: great DX, audio-intelligence features, BAA available.
- Watch-out: published accuracy figures vary across their materials — benchmark on your own clinical audio.
5. Corti (Tier 2 / 3)
Corti is API‑native and purpose‑built for healthcare, spanning medical transcription and ambient documentation with a focus on real‑time clinical workflows. It's a strong choice when you want a healthcare‑specialized vendor rather than a general STT platform with a medical mode.
- Strengths: healthcare-first design, real-time documentation, API-native.
- Watch-out: confirm BAA, data-residency, and regional coverage for your market.
6. Suki (Tier 2 / 3)
Suki offers its assistant capabilities to partners via an SDK (Suki Platform), letting you embed a proven ambient‑documentation experience inside your own application. It supports a broad set of languages and is attractive when you want a battle‑tested assistant rather than raw building blocks.
- Strengths: mature assistant, broad language support, embeddable via SDK.
- Watch-out: SDK/platform licensing differs from per-minute APIs — model the cost against your scale.
7. Nabla (Tier 3, white-label)
Nabla provides an ambient clinical scribe designed to be embedded and white‑labeled in your product, with broad specialty and language coverage. It's a direct alternative to a documentation API when your priority is a finished note experience inside your app rather than low‑level control.
- Strengths: white-label ambient scribe, wide specialty/language coverage.
- Watch-out: it's an experience layer — if you need granular control over the transcript or model, a Tier 2 API gives you more.
8. Google Cloud Speech-to-Text — medical models (Tier 2)
Google Cloud offers medical transcription models that are a sensible default if your stack already lives in Google Cloud. Coverage is narrower than the headline Speech‑to‑Text product (the medical models have historically been limited to specific models and locales), so confirm the current model and language support before you design around it.
- Strengths: native GCP integration, single BAA across Google Cloud.
- Watch-out: medical models are more limited than general STT — check model/locale availability and pricing on the live pricing page.
9. Nuance Dragon Medical SpeechKit (Tier 2, SDK)
Dragon Medical SpeechKit is the developer‑facing SDK behind Nuance's deep clinical vocabulary — distinct from the DAX/Dragon Copilot scribe products, which are end‑user applications rather than APIs. If you specifically need Nuance‑grade medical recognition embedded in your own app, SpeechKit is the real integration path.
- Strengths: best-in-class clinical vocabulary, mature in enterprise healthcare.
- Watch-out: SDK/enterprise licensing and onboarding are heavier than a self-serve per-minute API.
Honorable mentions (Tier 1)
Speechmatics and Soniox are excellent general‑purpose STT engines with strong accuracy and language coverage. They're not medical‑tuned, so we keep them out of the main ranking — but they're worth considering for non‑clinical audio, or as a base you fine‑tune and pair with your own medical post‑processing. As with every vendor here, confirm BAA availability before sending PHI.
How to choose the right tier for your product
- Decide whether you need a transcript or a note. This single question eliminates most of the list. Need a note? Look at Tier 3. Need a transcript? Tier 1/2.
- Match the vendor to your cloud. If you're committed to AWS or Google Cloud, starting with their medical services simplifies your BAA and billing.
- Benchmark on your own audio. Build a small evaluation set from representative encounters and measure WER yourself before committing.
- Confirm the BAA and retention terms in writing — including which plan they apply to and how long audio and transcripts are stored.
- Model the cost at your real scale. Per-minute pricing, per-request pricing, and SDK/seat licensing behave very differently as you grow.
If your answer to step one is "we need a clinical note," the build‑vs‑buy math usually favors a documentation API or partnership over assembling the note layer yourself. That's the gap our medical speech-to-text and documentation API is built to fill — and for platforms that want it fully embedded and branded, our partner program offers the same engine as a white‑label experience.

