← Resources
USECASE · 2026-03-12
Private AI for Healthcare: HIPAA-Aligned Workspaces Without Cloud Lock-in
HIPAA-aligned AI does not require shipping PHI to a third-party model. An on-device-first workspace with optional BYOK cloud fallback, audit logs, RBAC, and a signed BAA where any cloud is touched can satisfy most covered-entity requirements.
What HIPAA actually requires from an AI assistant
*This article is general guidance and not legal or compliance advice; your covered entity's privacy officer is the final word.*
HIPAA was written around the data, not the tool reading it, so an AI assistant is treated like any other workforce member or business associate that touches protected health information. The Privacy Rule governs permitted uses and disclosures of PHI and the minimum-necessary standard. The Security Rule requires administrative, physical, and technical safeguards, including access control, audit controls, integrity, transmission security, and a documented risk analysis. The Breach Notification Rule sets the duty to notify affected individuals, HHS, and in larger incidents the media when unsecured PHI is disclosed without authorization.
Recent HHS rulemaking has signaled that AI systems creating, receiving, maintaining, or transmitting ePHI must be inventoried and risk-analyzed like any other asset, with encryption in transit and at rest, multi-factor authentication, and timely incident response. The obligation does not change because the entity reading the chart is a model.
Where ChatGPT, Copilot, and Gemini fall short for PHI
Out of the box, the consumer tiers of ChatGPT, Microsoft Copilot, and Google Gemini are not appropriate for PHI. Free and standard subscriptions do not come with a Business Associate Agreement, and prompts may be retained or used for model improvement under the default terms.
The nuance worth keeping straight: each vendor offers enterprise configurations that can be brought under a BAA. OpenAI signs BAAs for the API and for ChatGPT Enterprise or Edu through sales-managed accounts. Microsoft offers BAA coverage for Azure OpenAI Service and certain Microsoft 365 Copilot tenants, though GitHub Copilot and other consumer surfaces are excluded. Google supports a BAA for Gemini inside qualifying Workspace tiers, but explicitly excludes NotebookLM and the Gemini-in-Chrome surface.
The practical risk is workforce drift. A clinician pasting a discharge summary into a free chatbot tab in a browser is, by default, a disclosure outside any BAA, regardless of what the enterprise contract says.
The on-device-first architecture: local model + opt-in BYOK
An architecture that minimizes HIPAA exposure inverts the usual SaaS pattern. The default model runs on the user's device, typically a quantized open-weights LLM served locally, so prompts and generations never leave the endpoint. When a clinician needs a stronger reasoning model, the workspace can fall back to a cloud provider using the organization's own API key, under that provider's BAA, with an explicit per-request or per-workspace opt-in.
This is the pattern behind osFoundry: a local llama.cpp runtime is the first-class default, and any cloud call is bring-your-own-key against a vendor the covered entity has independently contracted with. There is no shared multi-tenant inference pool sitting between the clinician and the model.
The compliance benefit is concrete. Workloads that can be handled locally never generate a cloud disclosure event. Workloads that require cloud capacity are tracked, attributable, and subject to the BAA the covered entity already vetted.
Audit, RBAC, DLP, and minimum-necessary in practice
Technical safeguards are where most AI pilots fail their first internal audit. Four controls do the heavy lifting.
Audit logs must capture who prompted, which model answered, what data sources were attached, and what was returned, with tamper-evident storage and a retention window that matches your record-retention policy. RBAC should bind model access, dataset access, and tool access to job role, so a front-desk account cannot query an oncology corpus and a billing role cannot trigger a clinical-notes generator.
DLP belongs at the prompt boundary. Pattern- and classifier-based redaction of identifiers before any cloud call enforces the minimum-necessary standard programmatically rather than by training alone. Per-workspace data residency keeps one practice's data from co-mingling with another's.
None of these are AI-specific inventions, but together they are what lets a privacy officer sign off on a model touching a chart.
Three workflows: intake notes, prior auth, ops Q&A
Intake and visit notes are the highest-value local workload. A clinician dictates, a local model drafts a structured note, and nothing leaves the device until the clinician signs and the EHR receives a posted record over its existing interface. Latency is acceptable on a modern laptop and the PHI surface area is the device only.
Prior authorization benefits from retrieval. The model assembles a draft letter from the patient chart and the payer's published medical-policy documents, with citations. If a stronger cloud model is invoked for drafting, the chart excerpt is de-identified at the prompt boundary and re-identified on the device after the response returns.
Operational Q&A across HR policy, billing codes, and SOPs rarely needs PHI at all and can run entirely on local or cloud models with PHI-blocking DLP in place. Splitting workloads this way keeps the highest-risk traffic on the device and the lowest-risk traffic in the cloud.
BAA and vendor risk: what to ask any AI vendor
Before signing, work through a short list. Will the vendor execute a BAA covering the exact product surface you will use, not a sibling product? Which specific endpoints, models, and consoles are in scope, and which are excluded in writing? What is the data retention default for prompts, completions, embeddings, and fine-tuning artifacts, and can it be set to zero?
Ask for the subprocessor list and the mechanism for notice when it changes. Confirm encryption in transit and at rest, key management posture, and whether customer-managed keys are available. Require breach notification timelines that let you meet your own 60-day patient notification obligation with margin.
Request the most recent SOC 2 Type II and any HITRUST attestation, the penetration testing cadence, and the incident response runbook. Finally, confirm in writing that the vendor will not train shared models on your tenant's data.
Pilot rollout in 30 days
Week one is scoping. The privacy officer, a clinical sponsor, and IT pick two or three workflows, write the data-flow diagram, and update the risk analysis. Identify which steps can be local-only and which require cloud, and confirm the BAA coverage for any cloud surface.
Week two is provisioning. Stand up the workspace with RBAC roles mapped to job titles, enable audit logging to your SIEM, configure DLP redaction rules at the prompt boundary, and load the local model on pilot devices. Document the rollback plan.
Week three is a supervised pilot with five to ten clinicians. Track time saved, error rate against a human-reviewed gold set, and any DLP triggers. Hold a mid-week review with the privacy officer.
Week four is the go or no-go. Update policy and training materials, finalize the AI-use disclosure to patients where applicable, and decide on expansion criteria for the next cohort.
Frequently asked questions
- Does running a model on-device avoid the need for a BAA entirely?
- For the local inference itself, yes, because no PHI is disclosed to a third party. You still need to account for the software vendor's role. If the local runtime, the workspace shell, or any management plane could receive PHI through telemetry, logs, or support sessions, the vendor is acting as a business associate and a BAA is appropriate. The safe pattern is to confirm in writing that the vendor receives no PHI from the device by default and to disable any optional telemetry that could carry content. Cloud fallbacks always require a BAA with the cloud model provider.
- Can we use ChatGPT, Copilot, or Gemini in a HIPAA workflow at all?
- Yes, but only on the specific enterprise configurations the vendor has agreed to cover under a BAA, and only for the surfaces explicitly named in that agreement. OpenAI covers the API and certain ChatGPT Enterprise or Edu tenants. Microsoft covers Azure OpenAI Service and qualifying Copilot tenants. Google covers Gemini in qualifying Workspace tiers while excluding NotebookLM and Gemini in Chrome. Free and standard consumer tiers are not covered. The harder problem is workforce behavior: blocking the consumer surfaces at the network or browser layer matters as much as the contract.
- How do audit logs differ for AI versus traditional EHR access?
- Traditional EHR audit logs capture record access by user, time, and action. AI audit logs need to capture the prompt, the model and version, the retrieved context, the response, the user, the workspace, and any tools the model invoked. They also need to capture cloud routing decisions when a BYOK fallback is used. Retain these alongside EHR logs under the same retention policy, store them in a tamper-evident store, and feed them into the same SIEM your security team already monitors so AI activity is part of normal incident response rather than a parallel silo.
- What about the new 2026 Security Rule changes?
- Recent HHS rulemaking moves several previously addressable safeguards into required ones, with stronger emphasis on encryption, multi-factor authentication, asset inventory that includes AI systems handling ePHI, and tighter incident reporting timelines. Treat any AI assistant as an in-scope asset for risk analysis, document the data flows, and confirm your incident response runbook can meet the shorter notification windows. Coordinate with your privacy officer on the exact effective dates and the transition expectations applicable to your organization, since enforcement posture and any extensions can shift after publication.
Sources