When evaluating a HIPAA-compliant AI notes vendor for clinical use, three areas directly impact HIPAA compliance: subprocessors, data storage, and model training. Subprocessors are third‑party vendors that may access patient information. Storage policies determine where data resides, how it is encrypted, and when it is deleted. Training data practices define whether clinical notes are used to improve vendor models. Without clear answers, covered entities assume unnecessary risk. This guide outlines specific questions to ask any AI notes vendor about these three areas, helping you conduct a thorough, compliant vendor assessment.

The Vendor’s Supply Chain: Questions About Subprocessors
The focus of these questions is on uncovering third‑party vendors who access PHI without your knowledge.
Who Are Your Subprocessors, and Have They All Signed BAAs?
- Why it Matters: Your primary AI notes vendor might be fully HIPAA compliant on paper, but their subprocessors (such as cloud infrastructure providers, speech-to-text engines, or error-logging services) often represent the greatest unmanaged risk.
Subprocessor Questions:
- Will you provide your complete list of subprocessors?
- Which subprocessors have direct or indirect access to unredacted PHI? (Indirect access includes logging, caching, or backup systems.)
- May I review your Business Associate Agreements (BAAs) with each subprocessor?
- How often is your subprocessor list updated, and how are customers notified?
Common Subprocessor Types in AI Notes Tools
Understanding who typically sits in the supply chain helps you ask better follow‑up questions.
Subprocessor Category | Example Function | PHI Access Risk Level |
|---|---|---|
Cloud infrastructure (AWS, Azure) | Data storage, compute, backups | High (if unencrypted or key access exists) |
Speech-to-text / ASR engine | Convert audio to a transcript | Very high (direct access to patient voice + clinical content) |
LLM hosting platform (OpenAI, Anthropic, etc.) | Generate summary notes | High (unless a private, zero-retention instance) |
Error tracking and logging | Capture debugging data | Medium (often captures snippets of text) |
Analytics & monitoring (Mixpanel, Datadog) | Usage metrics, performance | Low to medium |
Location, Encryption, and Retention: Questions About Storage
These questions focus on data residency, at‑rest protection, and deletion protocols.
Where Is My Data Stored?
- Why it Matters: Even if a vendor encrypts data, the physical location of servers determines legal jurisdiction, subpoena risk, and your ability to ensure proper deletion. Without clear answers, you cannot verify HIPAA compliance or respond to patient requests for data access.
Storage Risk Assessment
Storage Factor | Low Risk (Good) | High Risk (Avoid) |
|---|---|---|
Data Residency | Stored only in US regions (or your jurisdiction) | Stored internationally with unclear data sovereignty |
Encryption at Rest | AES-256 | Proprietary encryption or no key control |
Backup Retention | Clearly defined (e.g., 30 days) and deletable on request | Indefinite backups with no deletion mechanism |
Backup Access | Logged and restricted to emergency use only | Unmonitored or shared access |
How and When is Data Permanently Deleted?
- Why it Matters: HIPAA does not mandate specific retention periods, but the Privacy Rule gives patients the right to request amendments and access. More importantly, the Security Rule requires that PHI be properly disposed of when no longer needed. Vague deletion policies will only create more liability.
Additional Deletion Practices To Confirm:
- Deletion Verification: Request a description of the deletion process (e.g., overwriting, or logical deletion with retention limits).
- Backup Exclusion: Confirm that new backups stop including deleted data and that old backups are rotated out on a fixed schedule.
- Subprocessor Deletion: Ensure downstream subprocessors are contractually obligated to delete data on the same timeline.
The AI’s Memory: Questions About Training Data
These concerns are based on whether your clinical notes are used to improve the vendor’s models for other customers.
Is My PHI Used to Train Your Models? If So, Can I Opt Out?
Why it Matters: If a vendor uses your patient encounters to retrain their AI, those clinical details could resurface in another provider’s output; a direct breach of patient confidentiality.
Even “de‑identified” data carries re‑identification risk, and HIPAA’s de-identification standard (Safe Harbor or Expert Determination) is difficult to achieve with clinical free text.
The Critical Distinction:
- Active learning / Perpetual training: The vendor retains your notes (or derivatives) to improve their model for all customers. Your data directly benefits the vendor’s product, but also exposes your patients to potential exposure.
- Ephemeral Processing: The AI generates the note, then the input and output are deleted within a defined short window (e.g., 30 days) and never used for training.
Essential Training Data Questions:
1. By default, are my transcribed notes added to your training corpus?
Ask for the default setting, not just an option. Many vendors opt customers in automatically.
2. Is “anonymization” truly the removal of all 18 HIPAA identifiers, or just name/date scrubbing?
Clinical notes contain indirect identifiers (e.g., unique disease presentations, small population demographics) that are rarely scrubbed. Request a written de‑identification methodology.
3. Do you offer a zero-retention or no-training option? What is the additional cost?
Some vendors may charge a premium for full ephemeral processing.
4. How are voice recordings handled – are they used for voice model training?
Voice contains biometric identifiers. Using it to train speaker‑recognition models is a distinct risk that requires separate patient consent in many states.
Your 5-Minute Vendor Vetting Checklist

- Received full subprocessor list with BAAs- not just a sample, and confirmed each subprocessor has a signed BAA.
- Confirmed all data stays in US regions – with no international backups or caching outside your jurisdiction.
- Vendor contract includes the right to audit and a 60-day deletion certification – including backups and subprocessor copies.
- Signed agreement that no PHI (even de-identified) will be used for training without opt-in, and the default is off.
- Vendor provides breach notification within 72 hours.
Conclusion
Choosing a HIPAA-compliant AI notes tool is not just about accuracy or speed; it is also about operational compliance. Three areas separate responsible vendors from risky ones: subprocessors, storage, and training data. A vendor that cannot provide a live subprocessor list, refuses customer‑managed encryption keys, or defaults to training on your clinical notes should be disqualified regardless of price or features. In contrast, vendors that offer zero‑retention options, 60‑day deletion certification, and signed BAAs with every subcontractor demonstrate genuine accountability. Use the checklist above to standardize your evaluation.

