Blue Goat Cyber logoBlue Goat CyberSMMedical Device Cybersecurity
    K
    Guide · Pen Testing

    Penetration Testing for Medical Devices: A 2026 Explainer

    What medical device penetration testing is, why the FDA requires it under Section 524B, the four FDA-expected test categories, scope by device archetype, and what a credible deliverable contains.

    Hero illustration for the Pen Testing article: Penetration Testing for Medical Devices: A 2026 Explainer
    Christian Espinosa, Founder & CEO at Blue Goat Cyber

    By Christian Espinosa, MBA, CISSP

    Founder & CEO · Blue Goat Cyber

    A plain-language guide to medical device penetration testing - what it is, what the FDA actually requires, how it differs by device archetype, and what a submission-ready deliverable looks like. Aligned to the February 2026 final premarket cybersecurity guidance and Section 524B of the FD&C Act.

    Last updated: June 2026.

    1. What Medical Device Penetration Testing Is (and Is Not)

    A medical device penetration test is a time-boxed, exploit-driven evaluation of a real device by independent testers. The tester's job is to behave like a motivated attacker: chain weaknesses, bypass controls, and demonstrate impact - not just list missing patches.

    It is not:

    • A vulnerability scan (Nessus, OpenVAS) - automated, signature-based, no exploitation.
    • A code review or SAST run - source-level, no runtime proof.
    • A threat model - paper exercise that defines what could go wrong; the pen test proves what actually does.
    • A red team - broader, goal-oriented adversary simulation, often spanning people, process, and physical access.

    A pen test sits between the threat model and the red team: narrower than a red team, more concrete than a threat model, and reproducible enough to land in an eSTAR submission.

    2. Why the FDA Requires It

    Under Section 524B(b)(2) the FDA requires manufacturers of "cyber devices" to provide a reasonable assurance of cybersecurity, including evidence of testing. The February 2026 final guidance operationalizes this in Section V.B.4 - Penetration Testing, which expects:

    1. Independent testers (organizationally separate from the development team).
    2. Testing against the final or near-final device build.
    3. Coverage of every external interface in the security architecture.
    4. A report that the FDA reviewer can map back to your threat model and risk file.

    Pen testing also appears - implicitly - in the labeling, vulnerability-management, and postmarket sections. If your submission says you'll handle a class of vulnerability, the pen test is where you prove the control works.

    3. The Four FDA-Expected Test Categories

    The 2026 guidance and AAMI TIR57 expect coverage across four overlapping testing modes. A submission that only ticks one box is the most common reason a deficiency letter cites "insufficient testing".

    3.1 Vulnerability Chaining

    Combining multiple low- or medium-severity findings into a high-impact attack path - e.g. an unauthenticated info-leak plus a weak update signature plus a debug shell. Chaining is where most real exploits live, and it's the category most often skipped by scan-only "pen tests".

    3.2 Boundary Analysis

    Probing every trust boundary identified in the threat model: USB, serial, BLE, Wi-Fi, cellular, cloud API, mobile companion, service tools, and clinician portals. The deliverable should show that each boundary was exercised, not just the obvious ones.

    3.3 Closed-Box (Black-Box) Testing

    Testing from the attacker's starting position with no privileged knowledge - reverse engineering firmware, sniffing wireless protocols, fuzzing exposed services. Closed-box testing is how the FDA validates that your security relies on real controls, not on obscurity.

    3.4 Manual Exploit Verification

    Every finding the team reports must include a working proof-of-concept or a documented reason it could not be exploited (with the conditions that would change that). Scanner output alone is not sufficient evidence under the 2026 guidance.

    4. Scope by Device Archetype

    The same four test categories apply to every device, but the surface area and tooling differ wildly. Use this table to size an engagement before you scope it.

    Archetype Primary attack surface Specialist skills needed
    Embedded / firmware device UART, JTAG, SPI flash, secure boot, signed updates, debug interfaces Hardware reversing, firmware extraction (chip-off, JTAG), Ghidra / IDA, signed-update bypass
    SaMD (mobile + cloud) Mobile app (iOS/Android), backend APIs, IAM, tenant isolation, secrets handling OWASP MASVS, MobSF, Frida, Burp, cloud auth flow analysis
    Implantable + BLE-paired companion BLE pairing & bonding, GATT services, replay & MITM, companion app Sniffers (Ubertooth, Sniffle), BLE fuzzing, ATT/GATT protocol analysis
    Imaging modality / DICOM / PACS DICOM/DIMSE, HL7, network segmentation, vendor service accounts DICOM toolkits (dcm4che), HL7 fuzzing, hospital-network simulation
    Hospital-network-connected device Wired/wireless onboarding, 802.1X, default credentials, service portals Network attack tooling, AD/Kerberos basics, segmentation testing
    AI/ML SaMD Model integrity, adversarial inputs, prompt/data poisoning, MLOps pipeline Adversarial ML, model-signing review, pipeline IAM review

    A credible scoping conversation names the archetype, lists the boundaries from the threat model, and assigns hours to each.

    5. The Test Environment Problem

    The 2026 guidance expects testing against a production-equivalent configuration. In practice this means:

    • The exact hardware revision that will ship (not an early prototype with extra debug headers).
    • The release-candidate firmware/software build, signed with production keys where possible.
    • Surrogate networks, surrogate patient data, and - for hospital-connected devices - a hospital-network simulator (EHR, PACS, DHCP, NTP).
    • Documented deviations from production, with a justification for why each deviation doesn't change the security posture.

    If the FDA reviewer can't tell from the report what was tested, the test doesn't count. Include build hashes, hardware serials, and network diagrams.

    6. Methodologies the FDA Recognizes

    Cite the methodology you actually followed. The 2026 guidance and AAMI TIR57 explicitly reference:

    • OWASP MASVS / MSTG - mobile companion apps.
    • OWASP ASVS - web/cloud backends and clinician portals.
    • OWASP IoT Top 10 - embedded device weakness classes.
    • ISA/IEC 62443-4-2 - component-level cybersecurity requirements.
    • NIST SP 800-115 - overall testing process and reporting structure.
    • AAMI TIR57 - security risk management linkage between threats, tests, and risk file.

    A report that names its methodology, version, and the specific controls tested is far harder for a reviewer to push back on.

    7. What a Credible Pen-Test Deliverable Contains

    The minimum the FDA expects (and the minimum we ship on every engagement):

    1. Executive summary - one page, risk-rated, non-technical.
    2. Scope statement - device, build, environment, dates, in/out of scope.
    3. Methodology - which standards were followed, which tools used.
    4. Boundary coverage matrix - every threat-model boundary mapped to the tests run against it.
    5. Findings - each with CVSS v3.1, EPSS, exploitability narrative, screenshots/PoCs, remediation guidance, and a CWE.
    6. Chained attack paths - at least one end-to-end narrative per device.
    7. Retest results - findings that were fixed and re-verified.
    8. Residual risk statement - aligned to your ISO 14971 risk file.
    9. Tester attestation - independence, qualifications, hours.

    If your current vendor's report is a Nessus export with a cover page, you do not have a submission-ready deliverable. See our companion piece, 12 Critical Findings from Medical Device Penetration Tests, for what the body of a real report looks like.

    8. Pen Test vs Red Team vs Tabletop vs Threat Model

    Activity Goal Output When you need it
    Threat model Identify what could go wrong STRIDE/attack-tree diagrams, risk register Pre-design and continuously
    Penetration test Prove which threats are real on the built device Findings, PoCs, residual risk Before every FDA submission
    Red team Test detection + response across people/process Adversary narrative, control gaps Mature postmarket programs
    Tabletop Stress-test the incident-response plan Lessons learned, playbook updates Annually, postmarket

    Most manufacturers need a threat model and a pen test for premarket; red teams and tabletops belong to the postmarket cybersecurity program.

    9. Cost and Timeline Ranges

    Transparent ranges so you can budget. Actual quotes depend on archetype and number of interfaces:

    Device archetype Typical duration Typical investment (USD)
    SaMD only (mobile + cloud) 2 - 3 weeks $25K - $55K
    Single embedded device, no wireless 3 - 4 weeks $40K - $75K
    Embedded + BLE companion app 4 - 6 weeks $60K - $110K
    Imaging / DICOM / hospital-connected 5 - 8 weeks $75K - $150K
    Multi-device platform (gateway + nodes + cloud) 8 - 12 weeks $120K - $250K+

    Quotes below these ranges almost always omit one of the four test categories. Quotes above usually include premarket consulting, re-test, and FDA Q-response support.

    10. Frequently Asked Questions

    Does the FDA actually require a pen test? Yes. Section 524B(b)(2) requires testing evidence, and the February 2026 final guidance explicitly calls out penetration testing in Section V.B.4.

    Can our internal security team do it? No. The 2026 guidance expects independent testers - organizationally separate from the development team. An internal team can supplement, not replace, an independent assessment.

    When in the development cycle should we test? Against the release-candidate build, with enough runway to remediate critical findings and retest before submission. Most manufacturers schedule the pen test 8 - 12 weeks before their target FDA filing date.

    Does a pen test cover SBOM / vulnerability management? Partly. The pen test will exercise components in your SBOM and may surface exploitable CVEs, but the ongoing CVE-matching loop is a separate program. See our SBOM vulnerability management guide.

    Do we need a new pen test for every software update? Material design changes - new interfaces, new wireless, new cloud integration, new AI/ML model - trigger a re-test. Routine maintenance releases do not, provided your postmarket vulnerability-management plan is operating.

    How does this differ for Class III / PMA devices? Higher scrutiny, more required artifacts, longer engagements. See FDA PMA cybersecurity requirements and FDA pathway cybersecurity differences.

    11. Where to Go Next

    Ready to scope a pen test against the February 2026 guidance? Contact Blue Goat Cyber for a fixed-scope, fixed-price proposal.

    Sources & references

    Primary sources cited in this article. Links open in a new tab.

    1. February 2026 final premarket cybersecurity guidance- U.S. FDA
    Related 524B & eSTAR resources

    Keep going: the 524B and eSTAR working set

    Start with the walkthrough hub, then drill into the statute, the eSTAR field map, SBOM monitoring, postmarket planning, and deficiency response. Use these as the playbook behind every cyber device submission.

    Hub
    FDA Section 524B & eSTAR Cybersecurity Walkthrough

    Start here: the hub that ties the statute, the February 2026 guidance, and the eSTAR fields together in the order a submission team works through them.

    Ready when you are

    Get FDA cleared without the cybersecurity headaches.

    30-minute strategy session. No cost, no commitment - just answers from people who've shipped 250+ FDA submissions.