
Published: June 26, 2026
De-identification and anonymization are not the same thing. Under HIPAA, de-identification removes the 18 Safe Harbor identifiers (or relies on Expert Determination) so data is no longer Protected Health Information, but residual re-identification risk is permitted. Under GDPR, anonymization is a higher bar: the data must be irreversibly altered so no individual can be re-identified by any reasonably likely means, at which point GDPR no longer applies. For medical device manufacturers, real-world data, AI/ML training sets, and postmarket telemetry are usually de-identified, not anonymized, and the FDA expects that distinction to be documented.
Medical device teams routinely use "de-identified" and "anonymized" interchangeably in submissions, validation reports, and AI/ML training documentation. The two terms describe different legal standards, different technical thresholds, and different residual obligations. Getting the distinction wrong shows up in FDA AI/ML deficiency letters, in IRB findings, and in EU MDR technical files. This post clarifies the definitions, maps them to HIPAA Safe Harbor, HIPAA Expert Determination, GDPR Recital 26, and FDA AI/ML expectations, and explains where manufacturers most often slip.
Key Takeaways
- HIPAA de-identification (Safe Harbor or Expert Determination) tolerates a small, documented re-identification risk; the data is no longer PHI but the obligation to track the methodology remains.
- GDPR anonymization is a higher bar, irreversible, with no reasonably likely path to re-identification, and once met, GDPR no longer applies to the data set.
- Pseudonymization is not anonymization; under GDPR pseudonymized data is still personal data and still in scope.
- The FDA expects AI/ML submissions to describe the de-identification method, residual risk, and how identifiers are kept out of training and re-training pipelines.
- Most medical device "anonymized" data sets are technically de-identified; misusing the term is a common deficiency pattern.
Table of Contents
- How HIPAA Defines De-Identification
- How GDPR Defines Anonymization
- Pseudonymization Is a Third, Distinct Concept
- What the FDA Expects in AI/ML Submissions
- Common Re-Identification Risks in Device Data
- How Blue Goat Approaches De-Identification for MedTech
- FAQ
Why this matters
Medical devices generate identifiable data at every stage: clinical evaluation, real-world performance studies, postmarket telemetry, AI/ML training corpora, and customer support cases. Each downstream use carries a different legal regime, HIPAA in the US, GDPR in the EU, PIPEDA in Canada, plus institutional IRB rules, and each regime defines "no longer identifiable" differently. The FDA's February 3, 2026 final premarket cybersecurity guidance, the FDA's AI/ML guidance series, and the EU MDR technical documentation expectations all require manufacturers to document how patient data is handled across the product lifecycle. HHS's HIPAA de-identification guidance (45 CFR 164.514) and the European Data Protection Board's Opinion 05/2014 on Anonymisation Techniques remain the canonical references. A submission that claims "anonymized" data when the methodology actually meets only HIPAA Safe Harbor is technically inaccurate and creates risk during EU MDR review, IRB scrutiny, and FDA AI/ML re-training notifications.
How HIPAA Defines De-Identification
Safe Harbor
HIPAA's Safe Harbor method (45 CFR 164.514(b)(2)) lists 18 identifiers that must be removed: names, geographic subdivisions smaller than a state, all elements of dates except year for ages under 89, telephone numbers, email addresses, MRNs, device identifiers and serial numbers, biometric identifiers, full-face photos, and any other unique identifying number or code. Once those 18 are removed and the covered entity has no actual knowledge that the remaining information could be used alone or in combination to re-identify an individual, the data is no longer PHI.
Expert Determination
The Expert Determination method (45 CFR 164.514(b)(1)) allows a qualified statistical expert to apply and document methods that render the risk of re-identification "very small." There is no fixed numeric threshold in the regulation; HHS guidance points to expert judgment with a documented analysis. Expert Determination is the typical path when device data sets need to retain higher granularity (full dates, device serial numbers, geographic detail) than Safe Harbor allows.
What De-Identification Is Not
[KEY REQUIREMENT] HIPAA de-identification does not require zero re-identification risk. It requires a documented method (Safe Harbor or Expert Determination) and the absence of actual knowledge of residual identifiability. That residual risk is exactly why GDPR does not treat HIPAA-de-identified data as anonymous.
How GDPR Defines Anonymization
GDPR Recital 26 sets a stricter bar. Data is anonymized only when it is rendered anonymous "in such a manner that the data subject is not or no longer identifiable." The European Data Protection Board's Opinion 05/2014 frames the test against three risks: singling out, linkability, and inference. If any of the three remains reasonably likely using means likely to be used by the controller or another person, the data is not anonymized. Once data clears that bar, GDPR no longer applies, but the bar is rarely cleared in practice for medical device telemetry, which is high-dimensional and easily re-linked.
Pseudonymization Is a Third, Distinct Concept
Pseudonymization replaces direct identifiers with tokens while keeping a separate key that can re-identify subjects. GDPR Article 4(5) defines it explicitly and treats pseudonymized data as personal data still in scope. Many medical device pipelines describe their training data as "anonymized" when the technical implementation is pseudonymization with a key held by the sponsor or CRO. That is a personal-data set under GDPR and a limited data set or PHI under HIPAA depending on what was removed.
| Concept | Legal regime | Re-identification risk | Status |
|---|---|---|---|
| HIPAA Safe Harbor de-identification | HIPAA (US) | Very small, documented | Not PHI |
| HIPAA Expert Determination | HIPAA (US) | Expert-judged "very small" | Not PHI |
| GDPR anonymization | GDPR (EU) | None reasonably likely | Out of GDPR scope |
| Pseudonymization | GDPR (EU) / HIPAA limited data set | Re-identifiable with key | In scope as personal data / limited PHI |
What the FDA Expects in AI/ML Submissions
Training Data Provenance
See also: HIPAA and Medical Device Manufacturers: What Cybersecurity Obligations Actually Apply.
FDA AI/ML submissions, particularly those that include a Predetermined Change Control Plan (PCCP), are expected to describe how training and validation data were collected, de-identified, and curated. Reviewers look for a named method (Safe Harbor or Expert Determination), the residual risk analysis, and the controls that prevent identifiers from re-entering the training pipeline during re-training cycles.
Re-Training and Postmarket Data
When a PCCP allows the model to be re-trained on postmarket data, the de-identification method must apply to that data stream too. A common deficiency is a clean Safe Harbor process for the initial training set with no documented method for the postmarket telemetry that feeds re-training.
Language Precision
Submissions that use "anonymized" loosely invite questions. The FDA does not require GDPR-grade anonymization for US submissions, but it does expect the term used in the file to match the method actually applied. "De-identified per the HIPAA Safe Harbor method" is the safer phrasing.
Common Re-Identification Risks in Device Data
- Device serial numbers and UDIs are unique identifiers under HIPAA Safe Harbor and must be removed or transformed.
- Timestamps with second-level precision combined with rare events or rare device models can re-identify individuals.
- High-dimensional waveforms (ECG, EEG, gait) are increasingly shown in the literature to be quasi-biometric and re-identifiable.
- Free-text fields in clinician notes and customer support cases routinely contain direct identifiers that automated scrubbers miss.
- Linkage to external data sets (insurance claims, social media, voter rolls) is the dominant re-identification pathway and the reason GDPR requires anonymization to survive "means likely to be used by any person."
How Blue Goat Approaches De-Identification for MedTech
We treat de-identification as a documented control inside the device quality system, not a one-time data engineering step. For every product that handles patient data we map the data flow, identify which regime applies at each hop (HIPAA, GDPR, IRB), document the method (Safe Harbor, Expert Determination, or pseudonymization) with residual-risk analysis, and wire the controls into the AI/ML re-training pipeline so the same standard applies postmarket. Our team holds CISSP and OSCP credentials with prior military red-team experience, and our work is grounded in 45 CFR 164.514, EDPB Opinion 05/2014, the FDA's February 3, 2026 final premarket cybersecurity guidance, and the FDA's AI/ML guidance series. If the FDA raises cybersecurity deficiencies after our submission, we resolve them at no additional cost. Start with our HIPAA compliance program for MedTech or the AI/ML PCCP cybersecurity guide.
FAQ
What is the difference between de-identification and anonymization for medical device data?
De-identification is the HIPAA standard: remove the 18 Safe Harbor identifiers or apply Expert Determination so the data is no longer PHI, with a small documented residual re-identification risk allowed. Anonymization is the GDPR standard: the data must be altered so that no reasonably likely means could re-identify an individual, at which point GDPR no longer applies. Most medical device data sets meet the first bar, not the second.
Is HIPAA-de-identified data automatically GDPR-anonymized?
No. HIPAA de-identification tolerates a documented residual re-identification risk; GDPR anonymization does not. A data set that satisfies HIPAA Safe Harbor is typically still personal data under GDPR and remains in scope for EU rules. Manufacturers serving both markets need separate analyses for each regime.
Does pseudonymization count as anonymization?
No. GDPR Article 4(5) treats pseudonymized data as personal data because a key still exists that can re-identify individuals. Under HIPAA, data with a re-identification code held by the covered entity is a limited data set, not de-identified data. Calling pseudonymized data "anonymized" in a submission is a common and avoidable error.
What does the FDA expect for AI/ML training data?
The FDA expects a named de-identification method, a residual-risk analysis, and evidence that identifiers cannot re-enter the training or re-training pipeline. For models under a Predetermined Change Control Plan, the same method must apply to the postmarket data stream that drives re-training, not only the initial training corpus.
Can device telemetry ever be truly anonymized?
Rarely. High-dimensional telemetry such as ECG waveforms, gait data, or detailed event logs is often quasi-biometric and re-linkable to known data sets. In most cases manufacturers should plan for de-identified, GDPR-personal-data handling rather than claim anonymization that the data engineering cannot defend.
Ready to align your data pipeline with HIPAA, GDPR, and FDA AI/ML expectations?
If your device handles patient data and you need a defensible de-identification methodology documented across training, validation, and postmarket re-training, we can help. If the FDA raises cybersecurity deficiencies after our submission, we resolve them at no additional cost. Schedule a discovery call.
Christian Espinosa, Founder, Blue Goat Cyber, CISSP, OSCP. Christian has led cybersecurity and data-handling reviews for AI/ML-enabled medical devices across Class II and Class III submissions and previously commanded military red-team operations. Read more at christian-espinosa.