
A plain-language guide to medical device penetration testing - what it is, what the FDA actually requires, how it differs by device archetype, and what a submission-ready deliverable looks like. Aligned to the February 2026 final premarket cybersecurity guidance and Section 524B of the FD&C Act.
Last updated: June 2026.
1. What Medical Device Penetration Testing Is (and Is Not)
A medical device penetration test is a time-boxed, exploit-driven evaluation of a real device by independent testers. The tester's job is to behave like a motivated attacker: chain weaknesses, bypass controls, and demonstrate impact - not just list missing patches.
It is not:
- A vulnerability scan (Nessus, OpenVAS) - automated, signature-based, no exploitation.
- A code review or SAST run - source-level, no runtime proof.
- A threat model - paper exercise that defines what could go wrong; the pen test proves what actually does.
- A red team - broader, goal-oriented adversary simulation, often spanning people, process, and physical access.
A pen test sits between the threat model and the red team: narrower than a red team, more concrete than a threat model, and reproducible enough to land in an eSTAR submission.
2. Why the FDA Requires It
Under Section 524B(b)(2) the FDA requires manufacturers of "cyber devices" to provide a reasonable assurance of cybersecurity, including evidence of testing. The February 2026 final guidance operationalizes this in Section V.B.4 - Penetration Testing, which expects:
- Independent testers (organizationally separate from the development team).
- Testing against the final or near-final device build.
- Coverage of every external interface in the security architecture.
- A report that the FDA reviewer can map back to your threat model and risk file.
Pen testing also appears - implicitly - in the labeling, vulnerability-management, and postmarket sections. If your submission says you'll handle a class of vulnerability, the pen test is where you prove the control works.
3. The Four FDA-Expected Test Categories
The 2026 guidance and AAMI TIR57 expect coverage across four overlapping testing modes. A submission that only ticks one box is the most common reason a deficiency letter cites "insufficient testing".
3.1 Vulnerability Chaining
Combining multiple low- or medium-severity findings into a high-impact attack path - e.g. an unauthenticated info-leak plus a weak update signature plus a debug shell. Chaining is where most real exploits live, and it's the category most often skipped by scan-only "pen tests".
3.2 Boundary Analysis
Probing every trust boundary identified in the threat model: USB, serial, BLE, Wi-Fi, cellular, cloud API, mobile companion, service tools, and clinician portals. The deliverable should show that each boundary was exercised, not just the obvious ones.
3.3 Closed-Box (Black-Box) Testing
Testing from the attacker's starting position with no privileged knowledge - reverse engineering firmware, sniffing wireless protocols, fuzzing exposed services. Closed-box testing is how the FDA validates that your security relies on real controls, not on obscurity.
3.4 Manual Exploit Verification
Every finding the team reports must include a working proof-of-concept or a documented reason it could not be exploited (with the conditions that would change that). Scanner output alone is not sufficient evidence under the 2026 guidance.
4. Scope by Device Archetype
The same four test categories apply to every device, but the surface area and tooling differ wildly. Use this table to size an engagement before you scope it.
| Archetype | Primary attack surface | Specialist skills needed |
|---|---|---|
| Embedded / firmware device | UART, JTAG, SPI flash, secure boot, signed updates, debug interfaces | Hardware reversing, firmware extraction (chip-off, JTAG), Ghidra / IDA, signed-update bypass |
| SaMD (mobile + cloud) | Mobile app (iOS/Android), backend APIs, IAM, tenant isolation, secrets handling | OWASP MASVS, MobSF, Frida, Burp, cloud auth flow analysis |
| Implantable + BLE-paired companion | BLE pairing & bonding, GATT services, replay & MITM, companion app | Sniffers (Ubertooth, Sniffle), BLE fuzzing, ATT/GATT protocol analysis |
| Imaging modality / DICOM / PACS | DICOM/DIMSE, HL7, network segmentation, vendor service accounts | DICOM toolkits (dcm4che), HL7 fuzzing, hospital-network simulation |
| Hospital-network-connected device | Wired/wireless onboarding, 802.1X, default credentials, service portals | Network attack tooling, AD/Kerberos basics, segmentation testing |
| AI/ML SaMD | Model integrity, adversarial inputs, prompt/data poisoning, MLOps pipeline | Adversarial ML, model-signing review, pipeline IAM review |
A credible scoping conversation names the archetype, lists the boundaries from the threat model, and assigns hours to each.
5. The Test Environment Problem
The 2026 guidance expects testing against a production-equivalent configuration. In practice this means:
- The exact hardware revision that will ship (not an early prototype with extra debug headers).
- The release-candidate firmware/software build, signed with production keys where possible.
- Surrogate networks, surrogate patient data, and - for hospital-connected devices - a hospital-network simulator (EHR, PACS, DHCP, NTP).
- Documented deviations from production, with a justification for why each deviation doesn't change the security posture.
If the FDA reviewer can't tell from the report what was tested, the test doesn't count. Include build hashes, hardware serials, and network diagrams.
6. Methodologies the FDA Recognizes
Cite the methodology you actually followed. The 2026 guidance and AAMI TIR57 explicitly reference:
- OWASP MASVS / MSTG - mobile companion apps.
- OWASP ASVS - web/cloud backends and clinician portals.
- OWASP IoT Top 10 - embedded device weakness classes.
- ISA/IEC 62443-4-2 - component-level cybersecurity requirements.
- NIST SP 800-115 - overall testing process and reporting structure.
- AAMI TIR57 - security risk management linkage between threats, tests, and risk file.
A report that names its methodology, version, and the specific controls tested is far harder for a reviewer to push back on.
7. What a Credible Pen-Test Deliverable Contains
The minimum the FDA expects (and the minimum we ship on every engagement):
- Executive summary - one page, risk-rated, non-technical.
- Scope statement - device, build, environment, dates, in/out of scope.
- Methodology - which standards were followed, which tools used.
- Boundary coverage matrix - every threat-model boundary mapped to the tests run against it.
- Findings - each with CVSS v3.1, EPSS, exploitability narrative, screenshots/PoCs, remediation guidance, and a CWE.
- Chained attack paths - at least one end-to-end narrative per device.
- Retest results - findings that were fixed and re-verified.
- Residual risk statement - aligned to your ISO 14971 risk file.
- Tester attestation - independence, qualifications, hours.
If your current vendor's report is a Nessus export with a cover page, you do not have a submission-ready deliverable. See our companion piece, 12 Critical Findings from Medical Device Penetration Tests, for what the body of a real report looks like.
8. Pen Test vs Red Team vs Tabletop vs Threat Model
| Activity | Goal | Output | When you need it |
|---|---|---|---|
| Threat model | Identify what could go wrong | STRIDE/attack-tree diagrams, risk register | Pre-design and continuously |
| Penetration test | Prove which threats are real on the built device | Findings, PoCs, residual risk | Before every FDA submission |
| Red team | Test detection + response across people/process | Adversary narrative, control gaps | Mature postmarket programs |
| Tabletop | Stress-test the incident-response plan | Lessons learned, playbook updates | Annually, postmarket |
Most manufacturers need a threat model and a pen test for premarket; red teams and tabletops belong to the postmarket cybersecurity program.
9. Cost and Timeline Ranges
Transparent ranges so you can budget. Actual quotes depend on archetype and number of interfaces:
| Device archetype | Typical duration | Typical investment (USD) |
|---|---|---|
| SaMD only (mobile + cloud) | 2 - 3 weeks | $25K - $55K |
| Single embedded device, no wireless | 3 - 4 weeks | $40K - $75K |
| Embedded + BLE companion app | 4 - 6 weeks | $60K - $110K |
| Imaging / DICOM / hospital-connected | 5 - 8 weeks | $75K - $150K |
| Multi-device platform (gateway + nodes + cloud) | 8 - 12 weeks | $120K - $250K+ |
Quotes below these ranges almost always omit one of the four test categories. Quotes above usually include premarket consulting, re-test, and FDA Q-response support.
10. Frequently Asked Questions
Does the FDA actually require a pen test? Yes. Section 524B(b)(2) requires testing evidence, and the February 2026 final guidance explicitly calls out penetration testing in Section V.B.4.
Can our internal security team do it? No. The 2026 guidance expects independent testers - organizationally separate from the development team. An internal team can supplement, not replace, an independent assessment.
When in the development cycle should we test? Against the release-candidate build, with enough runway to remediate critical findings and retest before submission. Most manufacturers schedule the pen test 8 - 12 weeks before their target FDA filing date.
Does a pen test cover SBOM / vulnerability management? Partly. The pen test will exercise components in your SBOM and may surface exploitable CVEs, but the ongoing CVE-matching loop is a separate program. See our SBOM vulnerability management guide.
Do we need a new pen test for every software update? Material design changes - new interfaces, new wireless, new cloud integration, new AI/ML model - trigger a re-test. Routine maintenance releases do not, provided your postmarket vulnerability-management plan is operating.
How does this differ for Class III / PMA devices? Higher scrutiny, more required artifacts, longer engagements. See FDA PMA cybersecurity requirements and FDA pathway cybersecurity differences.
11. Where to Go Next
- Service overview: Medical Device Penetration Testing
- Requirements deep-dive: Medical Device Pen Test Requirements
- Field findings: 12 Critical Findings from Medical Device Penetration Tests
- Premarket roadmap: FDA Premarket Cybersecurity Submission Checklist
- Standards context: The MedTech Cybersecurity Standards Decoder
Ready to scope a pen test against the February 2026 guidance? Contact Blue Goat Cyber for a fixed-scope, fixed-price proposal.
Sources & references
Primary sources cited in this article. Links open in a new tab.