De-Identification vs Anonymization

On this page

By Christian Espinosa, MBA, CISSP

Founder & CEO · Blue Goat Cyber

Published: June 26, 2026

Key Takeaways

HIPAA de-identification (Safe Harbor or Expert Determination) tolerates a small, documented re-identification risk; the data is no longer PHI but the obligation to track the methodology remains.
GDPR anonymization is a higher bar, irreversible, with no reasonably likely path to re-identification, and once met, GDPR no longer applies to the data set.
Pseudonymization is not anonymization; under GDPR pseudonymized data is still personal data and still in scope.
The FDA expects AI/ML submissions to describe the de-identification method, residual risk, and how identifiers are kept out of training and re-training pipelines.
Most medical device "anonymized" data sets are technically de-identified; misusing the term is a common deficiency pattern.

TL;DR

De-identification and anonymization are not the same thing. Under HIPAA, de-identification removes the 18 Safe Harbor identifiers (or relies on Expert Determination) so data is no longer Protected Health Information, but residual re-identification risk is permitted. Under GDPR, anonymization is a higher bar: the data must be irreversibly altered so no individual can be re-identified by any reasonably likely means, at which point GDPR no longer applies. For medical device manufacturers, real-world data, AI/ML training sets, and postmarket telemetry are usually de-identified, not anonymized, and the FDA expects that distinction to be documented.

Medical device teams routinely use "de-identified" and "anonymized" interchangeably in submissions, validation reports, and AI/ML training documentation. The two terms describe different legal standards, different technical thresholds, and different residual obligations. Getting the distinction wrong shows up in FDA AI/ML deficiency letters, in IRB findings, and in EU MDR technical files. This post clarifies the definitions, maps them to HIPAA Safe Harbor, HIPAA Expert Determination, GDPR Recital 26, and FDA AI/ML expectations, and explains where manufacturers most often slip.

How HIPAA Defines De-Identification
How GDPR Defines Anonymization
Pseudonymization Is a Third, Distinct Concept
What the FDA Expects in AI/ML Submissions
Common Re-Identification Risks in Device Data
How Blue Goat Approaches De-Identification for MedTech
FAQ

Why this matters

Medical devices generate identifiable data at every stage: clinical evaluation, real-world performance studies, postmarket telemetry, AI/ML training corpora, and customer support cases. Each downstream use carries a different legal regime, HIPAA in the US, GDPR in the EU, PIPEDA in Canada, plus institutional IRB rules, and each regime defines "no longer identifiable" differently. The FDA's February 3, 2026 final premarket cybersecurity guidance, the FDA's AI/ML guidance series, and the EU MDR technical documentation expectations all require manufacturers to document how patient data is handled across the product lifecycle. HHS's HIPAA de-identification guidance (45 CFR 164.514) and the European Data Protection Board's Opinion 05/2014 on Anonymisation Techniques remain the canonical references. A submission that claims "anonymized" data when the methodology actually meets only HIPAA Safe Harbor is technically inaccurate and creates risk during EU MDR review, IRB scrutiny, and FDA AI/ML re-training notifications.

How HIPAA Defines De-Identification

Safe Harbor

HIPAA's Safe Harbor method (45 CFR 164.514(b)(2)) lists 18 identifiers that must be removed: names, geographic subdivisions smaller than a state, all elements of dates except year for ages under 89, telephone numbers, email addresses, MRNs, device identifiers and serial numbers, biometric identifiers, full-face photos, and any other unique identifying number or code. Once those 18 are removed and the covered entity has no actual knowledge that the remaining information could be used alone or in combination to re-identify an individual, the data is no longer PHI.

Expert Determination

The Expert Determination method (45 CFR 164.514(b)(1)) allows a qualified statistical expert to apply and document methods that render the risk of re-identification "very small." There is no fixed numeric threshold in the regulation; HHS guidance points to expert judgment with a documented analysis. Expert Determination is the typical path when device data sets need to retain higher granularity (full dates, device serial numbers, geographic detail) than Safe Harbor allows.

What De-Identification Is Not

[KEY REQUIREMENT] HIPAA de-identification does not require zero re-identification risk. It requires a documented method (Safe Harbor or Expert Determination) and the absence of actual knowledge of residual identifiability. That residual risk is exactly why GDPR does not treat HIPAA-de-identified data as anonymous.

GDPR Recital 26 sets a stricter bar. Data is anonymized only when it is rendered anonymous "in such a manner that the data subject is not or no longer identifiable." The European Data Protection Board's Opinion 05/2014 frames the test against three risks: singling out, linkability, and inference. If any of the three remains reasonably likely using means likely to be used by the controller or another person, the data is not anonymized. Once data clears that bar, GDPR no longer applies, but the bar is rarely cleared in practice for medical device telemetry, which is high-dimensional and easily re-linked.

Pseudonymization Is a Third, Distinct Concept

Pseudonymization replaces direct identifiers with tokens while keeping a separate key that can re-identify subjects. GDPR Article 4(5) defines it explicitly and treats pseudonymized data as personal data still in scope. Many medical device pipelines describe their training data as "anonymized" when the technical implementation is pseudonymization with a key held by the sponsor or CRO. That is a personal-data set under GDPR and a limited data set or PHI under HIPAA depending on what was removed.

Concept	Legal regime	Re-identification risk	Status
HIPAA Safe Harbor de-identification	HIPAA (US)	Very small, documented	Not PHI
HIPAA Expert Determination	HIPAA (US)	Expert-judged "very small"	Not PHI
GDPR anonymization	GDPR (EU)	None reasonably likely	Out of GDPR scope
Pseudonymization	GDPR (EU) / HIPAA limited data set	Re-identifiable with key	In scope as personal data / limited PHI

What the FDA Expects in AI/ML Submissions

Training Data Provenance

See also: FDA PCCP Beyond AI: Cybersecurity, Firmware Uses | Blue Goat, and HIPAA and Medical Device Manufacturers.

FDA AI/ML submissions, particularly those that include a Predetermined Change Control Plan (PCCP), are expected to describe how training and validation data were collected, de-identified, and curated. Reviewers look for a named method (Safe Harbor or Expert Determination), the residual risk analysis, and the controls that prevent identifiers from re-entering the training pipeline during re-training cycles.

Re-Training and Postmarket Data

When a PCCP allows the model to be re-trained on postmarket data, the de-identification method must apply to that data stream too. A common deficiency is a clean Safe Harbor process for the initial training set with no documented method for the postmarket telemetry that feeds re-training.

Language Precision

Submissions that use "anonymized" loosely invite questions. The FDA does not require GDPR-grade anonymization for US submissions, but it does expect the term used in the file to match the method actually applied. "De-identified per the HIPAA Safe Harbor method" is the safer phrasing.

Common Re-Identification Risks in Device Data

Device serial numbers and UDIs are unique identifiers under HIPAA Safe Harbor and must be removed or transformed.
Timestamps with second-level precision combined with rare events or rare device models can re-identify individuals.
High-dimensional waveforms (ECG, EEG, gait) are increasingly shown in the literature to be quasi-biometric and re-identifiable.
Free-text fields in clinician notes and customer support cases routinely contain direct identifiers that automated scrubbers miss.
Linkage to external data sets (insurance claims, social media, voter rolls) is the dominant re-identification pathway and the reason GDPR requires anonymization to survive "means likely to be used by any person."

Need help? Our team supports manufacturers with FDA cybersecurity submissions end-to-end. Explore our medical device cybersecurity services or book a discovery call.

How Blue Goat Approaches De-Identification for MedTech

We treat de-identification as a documented control inside the device quality system, not a one-time data engineering step. For every product that handles patient data we map the data flow, identify which regime applies at each hop (HIPAA, GDPR, IRB), document the method (Safe Harbor, Expert Determination, or pseudonymization) with residual-risk analysis, and wire the controls into the AI/ML re-training pipeline so the same standard applies postmarket. Our team holds CISSP and OSCP credentials with prior military red-team experience, and our work is grounded in 45 CFR 164.514, EDPB Opinion 05/2014, the FDA's February 3, 2026 final premarket cybersecurity guidance, and the FDA's AI/ML guidance series. If the FDA raises cybersecurity deficiencies after our submission, we resolve them at no additional cost. Start with our HIPAA compliance program for MedTech or the AI/ML PCCP cybersecurity guide.

FAQ

What is the difference between de-identification and anonymization for medical device data?

De-identification is the HIPAA standard: remove the 18 Safe Harbor identifiers or apply Expert Determination so the data is no longer PHI, with a small documented residual re-identification risk allowed. Anonymization is the GDPR standard: the data must be altered so that no reasonably likely means could re-identify an individual, at which point GDPR no longer applies. Most medical device data sets meet the first bar, not the second.

Is HIPAA-de-identified data automatically GDPR-anonymized?

No. HIPAA de-identification tolerates a documented residual re-identification risk; GDPR anonymization does not. A data set that satisfies HIPAA Safe Harbor is typically still personal data under GDPR and remains in scope for EU rules. Manufacturers serving both markets need separate analyses for each regime.

Does pseudonymization count as anonymization?

No. GDPR Article 4(5) treats pseudonymized data as personal data because a key still exists that can re-identify individuals. Under HIPAA, data with a re-identification code held by the covered entity is a limited data set, not de-identified data. Calling pseudonymized data "anonymized" in a submission is a common and avoidable error.

What does the FDA expect for AI/ML training data?

The FDA expects a named de-identification method, a residual-risk analysis, and evidence that identifiers cannot re-enter the training or re-training pipeline. For models under a Predetermined Change Control Plan, the same method must apply to the postmarket data stream that drives re-training, not only the initial training corpus.

Can device telemetry ever be truly anonymized?

Rarely. High-dimensional telemetry such as ECG waveforms, gait data, or detailed event logs is often quasi-biometric and re-linkable to known data sets. In most cases manufacturers should plan for de-identified, GDPR-personal-data handling rather than claim anonymization that the data engineering cannot defend.

If your device handles patient data and you need a defensible de-identification methodology documented across training, validation, and postmarket re-training, we can help. If the FDA raises cybersecurity deficiencies after our submission, we resolve them at no additional cost. Schedule a discovery call.

Christian Espinosa, Founder, Blue Goat Cyber, CISSP, OSCP. Christian has led cybersecurity and data-handling reviews for AI/ML-enabled medical devices across Class II and Class III submissions and previously commanded military red-team operations. Read more at christian-espinosa.

De-Identification vs Anonymization

Key Takeaways

Table of Contents

Why this matters

How HIPAA Defines De-Identification

Safe Harbor

Expert Determination

What De-Identification Is Not

Pseudonymization Is a Third, Distinct Concept

What the FDA Expects in AI/ML Submissions

Training Data Provenance

Re-Training and Postmarket Data

Language Precision

Common Re-Identification Risks in Device Data

How Blue Goat Approaches De-Identification for MedTech

FAQ

Continue exploring this topic

Keep reading

Keep going: the 524B and eSTAR working set

Put this into practice on your device

Get FDA cleared without the cybersecurity headaches.

De-Identification vs Anonymization

Key Takeaways

Table of Contents

Why this matters

How HIPAA Defines De-Identification

Safe Harbor

Expert Determination

What De-Identification Is Not

How GDPR Defines Anonymization

Pseudonymization Is a Third, Distinct Concept

What the FDA Expects in AI/ML Submissions

Training Data Provenance

Re-Training and Postmarket Data

Language Precision

Common Re-Identification Risks in Device Data

How Blue Goat Approaches De-Identification for MedTech

FAQ

Ready to align your data pipeline with HIPAA, GDPR, and FDA AI/ML expectations?

Continue exploring this topic

Keep reading

Keep going: the 524B and eSTAR working set

Put this into practice on your device

Get FDA cleared without the cybersecurity headaches.