Free for a week, then $19 for your first month
Expert Advice

What to Ask Your AI Notes Vendor About Subprocessors, Storage, and Training Data

Essential questions on subprocessors, storage, and training data for AI notes

Abstract node diagram of an AI notes vendor's data supply chain — a central vendor branching to subprocessors and downstream services, with one highlighted node marking a subprocessor to scrutinize.

When evaluating a HIPAA-compliant AI notes vendor for clinical use, three areas directly impact HIPAA compliance: subprocessors, data storage, and model training. Subprocessors are third‑party vendors that may access patient information. Storage policies determine where data resides, how it is encrypted, and when it is deleted. Training data practices define whether clinical notes are used to improve vendor models. Without clear answers, covered entities assume unnecessary risk. This guide outlines specific questions to ask any AI notes vendor about these three areas, helping you conduct a thorough, compliant vendor assessment.

Three columns of vendor due-diligence questions for AI notes — subprocessors (who they are, BAAs, change notifications), storage (location, encryption, retention), and training data (whether models train on your data or PHI, and opt-out).

The Vendor’s Supply Chain: Questions About Subprocessors

The focus of these questions is on uncovering third‑party vendors who access PHI without your knowledge.

Who Are Your Subprocessors, and Have They All Signed BAAs?

  • Why it Matters: Your primary AI notes vendor might be fully HIPAA compliant on paper, but their subprocessors (such as cloud infrastructure providers, speech-to-text engines, or error-logging services) often represent the greatest unmanaged risk.

Subprocessor Questions:

  1. Will you provide your complete list of subprocessors?
  2. Which subprocessors have direct or indirect access to unredacted PHI? (Indirect access includes logging, caching, or backup systems.)
  3. May I review your Business Associate Agreements (BAAs) with each subprocessor?
  4. How often is your subprocessor list updated, and how are customers notified?

Common Subprocessor Types in AI Notes Tools

Understanding who typically sits in the supply chain helps you ask better follow‑up questions.

Subprocessor Category

Example Function

PHI Access Risk Level

Cloud infrastructure (AWS, Azure)

Data storage, compute, backups

High (if unencrypted or key access exists)

Speech-to-text / ASR engine

Convert audio to a transcript

Very high (direct access to patient voice + clinical content)

LLM hosting platform (OpenAI, Anthropic, etc.)

Generate summary notes

High (unless a private, zero-retention instance)

Error tracking and logging

Capture debugging data

Medium (often captures snippets of text)

Analytics & monitoring (Mixpanel, Datadog)

Usage metrics, performance

Low to medium

Location, Encryption, and Retention: Questions About Storage

These questions focus on data residency, at‑rest protection, and deletion protocols.

Where Is My Data Stored?

  • Why it Matters: Even if a vendor encrypts data, the physical location of servers determines legal jurisdiction, subpoena risk, and your ability to ensure proper deletion. Without clear answers, you cannot verify HIPAA compliance or respond to patient requests for data access.

Storage Risk Assessment

Storage Factor

Low Risk (Good)

High Risk (Avoid)

Data Residency

Stored only in US regions (or your jurisdiction)

Stored internationally with unclear data sovereignty

Encryption at Rest

AES-256

Proprietary encryption or no key control

Backup Retention

Clearly defined (e.g., 30 days) and deletable on request

Indefinite backups with no deletion mechanism

Backup Access

Logged and restricted to emergency use only

Unmonitored or shared access

How and When is Data Permanently Deleted?

  • Why it Matters: HIPAA does not mandate specific retention periods, but the Privacy Rule gives patients the right to request amendments and access. More importantly, the Security Rule requires that PHI be properly disposed of when no longer needed. Vague deletion policies will only create more liability.

Additional Deletion Practices To Confirm:

  • Deletion Verification: Request a description of the deletion process (e.g., overwriting, or logical deletion with retention limits).
  • Backup Exclusion: Confirm that new backups stop including deleted data and that old backups are rotated out on a fixed schedule.
  • Subprocessor Deletion: Ensure downstream subprocessors are contractually obligated to delete data on the same timeline.

The AI’s Memory: Questions About Training Data

These concerns are based on whether your clinical notes are used to improve the vendor’s models for other customers.

Is My PHI Used to Train Your Models? If So, Can I Opt Out?

Why it Matters: If a vendor uses your patient encounters to retrain their AI, those clinical details could resurface in another provider’s output; a direct breach of patient confidentiality.

Even “de‑identified” data carries re‑identification risk, and HIPAA’s de-identification standard (Safe Harbor or Expert Determination) is difficult to achieve with clinical free text.

The Critical Distinction:

  • Active learning / Perpetual training: The vendor retains your notes (or derivatives) to improve their model for all customers. Your data directly benefits the vendor’s product, but also exposes your patients to potential exposure.
  • Ephemeral Processing: The AI generates the note, then the input and output are deleted within a defined short window (e.g., 30 days) and never used for training.
Essential Training Data Questions:

1. By default, are my transcribed notes added to your training corpus?

Ask for the default setting, not just an option. Many vendors opt customers in automatically.

2. Is “anonymization” truly the removal of all 18 HIPAA identifiers, or just name/date scrubbing?

Clinical notes contain indirect identifiers (e.g., unique disease presentations, small population demographics) that are rarely scrubbed. Request a written de‑identification methodology.

3. Do you offer a zero-retention or no-training option? What is the additional cost?

Some vendors may charge a premium for full ephemeral processing.

4. How are voice recordings handled – are they used for voice model training?

Voice contains biometric identifiers. Using it to train speaker‑recognition models is a distinct risk that requires separate patient consent in many states.

Your 5-Minute Vendor Vetting Checklist

A five-minute AI notes vendor vetting checklist: signed BAA with vendor and subprocessors, US data location, TLS and AES-256 encryption, audio retention or deletion, no training on PHI, and audit logs with least-privilege access.
  1. Received full subprocessor list with BAAs- not just a sample, and confirmed each subprocessor has a signed BAA.
  2. Confirmed all data stays in US regions – with no international backups or caching outside your jurisdiction.
  3. Vendor contract includes the right to audit and a 60-day deletion certification – including backups and subprocessor copies.
  4. Signed agreement that no PHI (even de-identified) will be used for training without opt-in, and the default is off.
  5. Vendor provides breach notification within 72 hours.

Conclusion

Choosing a HIPAA-compliant AI notes tool is not just about accuracy or speed; it is also about operational compliance. Three areas separate responsible vendors from risky ones: subprocessors, storage, and training data. A vendor that cannot provide a live subprocessor list, refuses customer‑managed encryption keys, or defaults to training on your clinical notes should be disqualified regardless of price or features. In contrast, vendors that offer zero‑retention options, 60‑day deletion certification, and signed BAAs with every subcontractor demonstrate genuine accountability. Use the checklist above to standardize your evaluation.

References

Alder, S. (2025, December). De-identification of Protected Health Information: 2026 Update. The HIPAA Journal.

Alder, S. (2026, January 3). HIPAA Privacy Rule - Updated for 2026. The HIPAA Journal.

Alder, S. (2026, January 5). HIPAA Business Associate Agreement - 2026 Update. The HIPAA Journal.

Alder, S. (2026, January 29). HIPAA Security Rule - Updated for 2026. The HIPAA Journal.

Green, S., & Green, S. L. (2025, July 1). US healthcare offshoring: Navigating patient data privacy laws and regulations. McDermott Will & Schulte.

Ummer, S. (2026, February 10). Ambient AI Scribes - Efficiency Gains vs Emerging Privacy and Cybersecurity Risks. American Bar Association

FAQ

Frequently asked questions

  • What happens if an AI note vendor adds a new subprocessor without notifying me?

    You could unknowingly expose patient data to a third party that has not signed a Business Associate Agreement (BAA) or that operates outside HIPAA compliance.

    • Contractual Protection: The vendor’s agreement should require written notice before adding any subprocessor that will access PHI.
    • Practical Risk: A new cloud logging tool, analytics platform, or transcription engine could inadvertently capture PHI in debug data. You would only learn of the breach after it occurs.
    • Best Practice: Demand a contractual provision that any new subprocessor interacting with PHI triggers a mandatory notice and a 30-day right for you to terminate without penalty.

    See how to spot a compliance red flag with AI vendors

  • What is the difference between “zero-retention” and “no-training” when evaluating AI notes vendors?

    These terms are often confused but address separate risks:

    • Zero-Retention (ephemeral processing): Data is deleted after the note is generated and delivered. This minimizes storage risk but does not guarantee that the data wasn't used for training before deletion. Always confirm the order of operations.
    • No-training (inference-only): The vendor processes your data to produce a note, and it is not used to train or improve their AI model.
  • What encryption standards should I look for when evaluating an AI notes vendor's storage?

    Encryption protects patient data both while it travels to the vendor and while it rests on their servers. Without strong, documented encryption, your PHI is vulnerable to interception, theft, or unauthorized internal access.

    • In-transit Encryption: The vendor must use TLS 1.3 for all data moving between your device and their systems.
    • At-rest Encryption: AES-256 is the current standard.
    • Key Management: Vendor-managed keys are common, but customer-managed keys (CMKs) give you the ability to revoke access instantly upon contract termination.
    • Verification: Request a copy of their encryption policy or a third-party audit (e.g., SOC 2 Type II) confirming these standards are applied consistently.

    See a more detailed compliance checklist when vetting HIPAA-compliant AI notes tools.