Credit: Pixabay/CC0 Public Domain
What is patient privacy for? The Hippocratic Oath, thought to be one of the earliest and most widely known medical ethics texts in the world, reads: "Whatever I see or hear in the lives of my patients, whether in connection with my professional practice or not, which ought not to be spoken of outside, I will keep secret, as considering all such things to be private."
As privacy becomes increasingly scarce in the age of data-hungry algorithms and cyberattacks, medicine is one of the few remaining domains where confidentiality remains central to practice, enabling patients to trust their physicians with sensitive information.
…
Credit: Pixabay/CC0 Public Domain
What is patient privacy for? The Hippocratic Oath, thought to be one of the earliest and most widely known medical ethics texts in the world, reads: "Whatever I see or hear in the lives of my patients, whether in connection with my professional practice or not, which ought not to be spoken of outside, I will keep secret, as considering all such things to be private."
As privacy becomes increasingly scarce in the age of data-hungry algorithms and cyberattacks, medicine is one of the few remaining domains where confidentiality remains central to practice, enabling patients to trust their physicians with sensitive information.
Risks of AI memorization in health care
But a paper co-authored by MIT researchers and posted to the arXiv preprint server investigates how artificial intelligence models trained on de-identified electronic health records (EHRs) can memorize patient-specific information.
The work, which was recently presented at the 2025 Conference on Neural Information Processing Systems (NeurIPS 2025), recommends a rigorous testing setup to ensure targeted prompts cannot reveal information, emphasizing that leakage must be evaluated in a health care context to determine whether it meaningfully compromises patient privacy.
Foundation models trained on EHRs should normally generalize knowledge to make better predictions, drawing upon many patient records. But in "memorization," the model draws upon a singular patient record to deliver its output, potentially violating patient privacy. Notably, foundation models are already known to be prone to data leakage.
"Knowledge in these high-capacity models can be a resource for many communities, but adversarial attackers can prompt a model to extract information on training data," says Sana Tonekaboni, a postdoc at the Eric and Wendy Schmidt Center at the Broad Institute of MIT and Harvard and first author of the paper. Given the risk that foundation models could also memorize private data, she notes, "this work is a step toward ensuring there are practical evaluation steps our community can take before releasing models."
Testing privacy risks and attack scenarios
To conduct research on the potential risk EHR foundation models could pose in medicine, Tonekaboni approached MIT Associate Professor Marzyeh Ghassemi, who is a principal investigator at the Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic), a member of the Computer Science and Artificial Intelligence Lab. Ghassemi, a faculty member in the MIT Department of Electrical Engineering and Computer Science and Institute for Medical Engineering and Science, runs the Healthy ML group, which focuses on robust machine learning in health.
Just how much information does a bad actor need to expose sensitive data, and what are the risks associated with the leaked information? To assess this, the research team developed a series of tests that they hope will lay the groundwork for future privacy evaluations. These tests are designed to measure various types of uncertainty, and assess their practical risk to patients by measuring various tiers of attack possibility.
"We really tried to emphasize practicality here; if an attacker has to know the date and value of a dozen laboratory tests from your record in order to extract information, there is very little risk of harm. If I already have access to that level of protected source data, why would I need to attack a large foundation model for more?" says Ghassemi.
Findings and implications for patient safety
With the inevitable digitization of medical records, data breaches have become more commonplace. In the past 24 months, the U.S. Department of Health and Human Services has recorded 747 data breaches of health information affecting more than 500 individuals, with the majority categorized as hacking/IT incidents.
In their structured tests, the researchers found that the more information the attacker has about a particular patient, the more likely the model is to leak information. They demonstrated how to distinguish model generalization cases from patient-level memorization, to properly assess privacy risk.
The paper also emphasized that some leaks are more harmful than others. For instance, a model revealing a patient’s age or demographics could be characterized as a more benign leakage than the model revealing more sensitive information, like an HIV diagnosis or alcohol abuse.
The researchers note that patients with unique conditions are especially vulnerable given how easy it is to pick them out, which may require higher levels of protection. "Even with de-identified data, it really depends on what sort of information you leak about the individual," Tonekaboni says. "Once you identify them, you know a lot more."
The researchers plan to expand the work to become more interdisciplinary, adding clinicians and privacy experts as well as legal experts. "There’s a reason our health data is private," Tonekaboni says. "There’s no reason for others to know about it."
More information: Sana Tonekaboni et al, An Investigation of Memorization Risk in Healthcare Foundation Models, arXiv (2025). DOI: 10.48550/arxiv.2510.12950
Journal information: arXiv
This story is republished courtesy of MIT News (web.mit.edu/newsoffice/), a popular site that covers news about MIT research, innovation and teaching.
Citation: Patient privacy in the age of clinical AI: Scientists investigate memorization risk (2026, January 6) retrieved 6 January 2026 from https://techxplore.com/news/2026-01-patient-privacy-age-clinical-ai.html
This document is subject to copyright. Apart from any fair dealing for the purpose of private study or research, no part may be reproduced without the written permission. The content is provided for information purposes only.