Blue Goat Cyber logoBlue Goat CyberSMMedical Device Cybersecurity
    K
    Blog · Privacy

    De-Identification vs Anonymization for Medical Devices: HIPAA, GDPR, FDA

    How de-identification and anonymization differ for medical device data under HIPAA Safe Harbor, Expert Determination, GDPR, and FDA AI/ML expectations — and where teams get it wrong.

    Hero illustration for the Privacy article: De-Identification vs Anonymization for Medical Devices: HIPAA, GDPR, FDA
    Christian Espinosa, Founder & CEO at Blue Goat Cyber

    By Christian Espinosa, MBA, CISSP

    Founder & CEO · Blue Goat Cyber

    Published: June 26, 2026

    Direct answer

    De-identification and anonymization are not the same thing. Under HIPAA, de-identification removes the 18 Safe Harbor identifiers (or relies on Expert Determination) so data is no longer Protected Health Information, but residual re-identification risk is permitted. Under GDPR, anonymization is a higher bar: the data must be irreversibly altered so no individual can be re-identified by any reasonably likely means, at which point GDPR no longer applies. For medical device manufacturers, real-world data, AI/ML training sets, and postmarket telemetry are usually de-identified, not anonymized, and the FDA expects that distinction to be documented.

    Medical device teams routinely use "de-identified" and "anonymized" interchangeably in submissions, validation reports, and AI/ML training documentation. The two terms describe different legal standards, different technical thresholds, and different residual obligations. Getting the distinction wrong shows up in FDA AI/ML deficiency letters, in IRB findings, and in EU MDR technical files. This post clarifies the definitions, maps them to HIPAA Safe Harbor, HIPAA Expert Determination, GDPR Recital 26, and FDA AI/ML expectations, and explains where manufacturers most often slip.

    Key Takeaways

    • HIPAA de-identification (Safe Harbor or Expert Determination) tolerates a small, documented re-identification risk; the data is no longer PHI but the obligation to track the methodology remains.
    • GDPR anonymization is a higher bar, irreversible, with no reasonably likely path to re-identification, and once met, GDPR no longer applies to the data set.
    • Pseudonymization is not anonymization; under GDPR pseudonymized data is still personal data and still in scope.
    • The FDA expects AI/ML submissions to describe the de-identification method, residual risk, and how identifiers are kept out of training and re-training pipelines.
    • Most medical device "anonymized" data sets are technically de-identified; misusing the term is a common deficiency pattern.

    Table of Contents

    Why this matters

    Medical devices generate identifiable data at every stage: clinical evaluation, real-world performance studies, postmarket telemetry, AI/ML training corpora, and customer support cases. Each downstream use carries a different legal regime, HIPAA in the US, GDPR in the EU, PIPEDA in Canada, plus institutional IRB rules, and each regime defines "no longer identifiable" differently. The FDA's February 3, 2026 final premarket cybersecurity guidance, the FDA's AI/ML guidance series, and the EU MDR technical documentation expectations all require manufacturers to document how patient data is handled across the product lifecycle. HHS's HIPAA de-identification guidance (45 CFR 164.514) and the European Data Protection Board's Opinion 05/2014 on Anonymisation Techniques remain the canonical references. A submission that claims "anonymized" data when the methodology actually meets only HIPAA Safe Harbor is technically inaccurate and creates risk during EU MDR review, IRB scrutiny, and FDA AI/ML re-training notifications.

    How HIPAA Defines De-Identification

    Safe Harbor

    HIPAA's Safe Harbor method (45 CFR 164.514(b)(2)) lists 18 identifiers that must be removed: names, geographic subdivisions smaller than a state, all elements of dates except year for ages under 89, telephone numbers, email addresses, MRNs, device identifiers and serial numbers, biometric identifiers, full-face photos, and any other unique identifying number or code. Once those 18 are removed and the covered entity has no actual knowledge that the remaining information could be used alone or in combination to re-identify an individual, the data is no longer PHI.

    Expert Determination

    The Expert Determination method (45 CFR 164.514(b)(1)) allows a qualified statistical expert to apply and document methods that render the risk of re-identification "very small." There is no fixed numeric threshold in the regulation; HHS guidance points to expert judgment with a documented analysis. Expert Determination is the typical path when device data sets need to retain higher granularity (full dates, device serial numbers, geographic detail) than Safe Harbor allows.

    What De-Identification Is Not

    [KEY REQUIREMENT] HIPAA de-identification does not require zero re-identification risk. It requires a documented method (Safe Harbor or Expert Determination) and the absence of actual knowledge of residual identifiability. That residual risk is exactly why GDPR does not treat HIPAA-de-identified data as anonymous.

    How GDPR Defines Anonymization

    GDPR Recital 26 sets a stricter bar. Data is anonymized only when it is rendered anonymous "in such a manner that the data subject is not or no longer identifiable." The European Data Protection Board's Opinion 05/2014 frames the test against three risks: singling out, linkability, and inference. If any of the three remains reasonably likely using means likely to be used by the controller or another person, the data is not anonymized. Once data clears that bar, GDPR no longer applies, but the bar is rarely cleared in practice for medical device telemetry, which is high-dimensional and easily re-linked.

    Pseudonymization Is a Third, Distinct Concept

    Pseudonymization replaces direct identifiers with tokens while keeping a separate key that can re-identify subjects. GDPR Article 4(5) defines it explicitly and treats pseudonymized data as personal data still in scope. Many medical device pipelines describe their training data as "anonymized" when the technical implementation is pseudonymization with a key held by the sponsor or CRO. That is a personal-data set under GDPR and a limited data set or PHI under HIPAA depending on what was removed.

    Concept Legal regime Re-identification risk Status
    HIPAA Safe Harbor de-identification HIPAA (US) Very small, documented Not PHI
    HIPAA Expert Determination HIPAA (US) Expert-judged "very small" Not PHI
    GDPR anonymization GDPR (EU) None reasonably likely Out of GDPR scope
    Pseudonymization GDPR (EU) / HIPAA limited data set Re-identifiable with key In scope as personal data / limited PHI

    What the FDA Expects in AI/ML Submissions

    Training Data Provenance

    See also: HIPAA and Medical Device Manufacturers: What Cybersecurity Obligations Actually Apply.

    FDA AI/ML submissions, particularly those that include a Predetermined Change Control Plan (PCCP), are expected to describe how training and validation data were collected, de-identified, and curated. Reviewers look for a named method (Safe Harbor or Expert Determination), the residual risk analysis, and the controls that prevent identifiers from re-entering the training pipeline during re-training cycles.

    Re-Training and Postmarket Data

    When a PCCP allows the model to be re-trained on postmarket data, the de-identification method must apply to that data stream too. A common deficiency is a clean Safe Harbor process for the initial training set with no documented method for the postmarket telemetry that feeds re-training.

    Language Precision

    Submissions that use "anonymized" loosely invite questions. The FDA does not require GDPR-grade anonymization for US submissions, but it does expect the term used in the file to match the method actually applied. "De-identified per the HIPAA Safe Harbor method" is the safer phrasing.

    Common Re-Identification Risks in Device Data

    • Device serial numbers and UDIs are unique identifiers under HIPAA Safe Harbor and must be removed or transformed.
    • Timestamps with second-level precision combined with rare events or rare device models can re-identify individuals.
    • High-dimensional waveforms (ECG, EEG, gait) are increasingly shown in the literature to be quasi-biometric and re-identifiable.
    • Free-text fields in clinician notes and customer support cases routinely contain direct identifiers that automated scrubbers miss.
    • Linkage to external data sets (insurance claims, social media, voter rolls) is the dominant re-identification pathway and the reason GDPR requires anonymization to survive "means likely to be used by any person."

    How Blue Goat Approaches De-Identification for MedTech

    We treat de-identification as a documented control inside the device quality system, not a one-time data engineering step. For every product that handles patient data we map the data flow, identify which regime applies at each hop (HIPAA, GDPR, IRB), document the method (Safe Harbor, Expert Determination, or pseudonymization) with residual-risk analysis, and wire the controls into the AI/ML re-training pipeline so the same standard applies postmarket. Our team holds CISSP and OSCP credentials with prior military red-team experience, and our work is grounded in 45 CFR 164.514, EDPB Opinion 05/2014, the FDA's February 3, 2026 final premarket cybersecurity guidance, and the FDA's AI/ML guidance series. If the FDA raises cybersecurity deficiencies after our submission, we resolve them at no additional cost. Start with our HIPAA compliance program for MedTech or the AI/ML PCCP cybersecurity guide.

    FAQ

    What is the difference between de-identification and anonymization for medical device data?

    De-identification is the HIPAA standard: remove the 18 Safe Harbor identifiers or apply Expert Determination so the data is no longer PHI, with a small documented residual re-identification risk allowed. Anonymization is the GDPR standard: the data must be altered so that no reasonably likely means could re-identify an individual, at which point GDPR no longer applies. Most medical device data sets meet the first bar, not the second.

    Is HIPAA-de-identified data automatically GDPR-anonymized?

    No. HIPAA de-identification tolerates a documented residual re-identification risk; GDPR anonymization does not. A data set that satisfies HIPAA Safe Harbor is typically still personal data under GDPR and remains in scope for EU rules. Manufacturers serving both markets need separate analyses for each regime.

    Does pseudonymization count as anonymization?

    No. GDPR Article 4(5) treats pseudonymized data as personal data because a key still exists that can re-identify individuals. Under HIPAA, data with a re-identification code held by the covered entity is a limited data set, not de-identified data. Calling pseudonymized data "anonymized" in a submission is a common and avoidable error.

    What does the FDA expect for AI/ML training data?

    The FDA expects a named de-identification method, a residual-risk analysis, and evidence that identifiers cannot re-enter the training or re-training pipeline. For models under a Predetermined Change Control Plan, the same method must apply to the postmarket data stream that drives re-training, not only the initial training corpus.

    Can device telemetry ever be truly anonymized?

    Rarely. High-dimensional telemetry such as ECG waveforms, gait data, or detailed event logs is often quasi-biometric and re-linkable to known data sets. In most cases manufacturers should plan for de-identified, GDPR-personal-data handling rather than claim anonymization that the data engineering cannot defend.

    Ready to align your data pipeline with HIPAA, GDPR, and FDA AI/ML expectations?

    If your device handles patient data and you need a defensible de-identification methodology documented across training, validation, and postmarket re-training, we can help. If the FDA raises cybersecurity deficiencies after our submission, we resolve them at no additional cost. Schedule a discovery call.


    Christian Espinosa, Founder, Blue Goat Cyber, CISSP, OSCP. Christian has led cybersecurity and data-handling reviews for AI/ML-enabled medical devices across Class II and Class III submissions and previously commanded military red-team operations. Read more at christian-espinosa.

    Related 524B & eSTAR resources

    Keep going: the 524B and eSTAR working set

    Start with the walkthrough hub, then drill into the statute, the eSTAR field map, SBOM monitoring, postmarket planning, and deficiency response. Use these as the playbook behind every cyber device submission.

    Hub
    FDA Section 524B & eSTAR Cybersecurity Walkthrough

    Start here: the hub that ties the statute, the February 2026 guidance, and the eSTAR fields together in the order a submission team works through them.

    Related services

    Put this into practice on your device

    Every Blue Goat Cyber engagement maps directly to FDA Section 524B and the SPDF - so the evidence you need lands in your submission, not in a separate report.

    Ready when you are

    Get FDA cleared without the cybersecurity headaches.

    30-minute strategy session. No cost, no commitment - just answers from people who've shipped 250+ FDA submissions.