Blog · FDA

FDA Penetration Testing Requirements

What the FDA's Feb 2026 premarket guidance actually requires for medical device penetration testing - what's inside a real pen test, what's separate.

On this page

By Christian Espinosa, MBA, CISSP

Founder & CEO · Blue Goat Cyber

Published: June 2, 2026

Key Takeaways

The FDA lists ten distinct testing activities. Penetration testing is one of them, not all of them.
A narrow "exploit and report" pen test leaves nine other evidence gaps for reviewers to flag.
Blue Goat's pen test engagement bundles attack surface analysis, abuse/misuse testing, fuzz testing, robustness testing, closed-box scanning, exploitation, and chaining into one evidence package.
SAST, DAST, [SBOM](/services/fda-compliant-sbom-services-for-medtech "FDA-compliant SBOM services")-driven SCA, hardcoded credential scanning, and continuous fuzzing in CI are separate dev-team activities and stay separate. The FDA wants evidence of both.
The pen test report must include the five required elements: tester independence and expertise, scope, duration, methods, and results.
ANSI/ISA 62443-4-1 §9 is the referenced process standard for vulnerability testing. The bundled engagement is structured to satisfy it.

Direct Answer

What the FDA's Feb 2026 premarket guidance actually requires for medical device penetration testing - what's inside a real pen test, what's separate.

Published June 2, 2026

Why this matters

The FDA's Cybersecurity in Medical Devices: Quality Management System Considerations and Content of Premarket Submissions (Feb 3, 2026 final guidance) made cybersecurity documentation a gating criterion for clearance under Section 524B of the FD&C Act. Reviewers now apply this guidance to fda penetration testing requirements for medical devices the same way they apply software lifecycle expectations from IEC 62304 and security risk-management expectations from AAMI TIR57 and ANSI/AAMI SW96:2023.

Gaps in this area are the single most common driver of first-cycle cybersecurity Additional Information (AI) requests. The FDA's FY2024 CDRH performance reports show cybersecurity is among the top deficiency categories cited in 510(k) and PMA AI letters, behind only software documentation and clinical evidence. Treating it as a checklist exercise rather than a design-controlled engineering artifact is what creates the gap.

The misconception that triggers deficiency letters

The most common pattern we see in deficiency letters: the manufacturer submits a clean pen test report, and the FDA reviewer comes back asking for fuzz testing evidence, abuse case testing, an attack surface analysis, or vulnerability chaining results. The manufacturer assumed "we did a pen test" covered all of it. It didn't.

The 2026 final guidance is explicit. Under the testing section, it lists penetration testing as one bullet inside a longer list of required activities:

Vulnerability testing (as described in ANSI/ISA 62443-4-1)
Abuse or misuse cases, malformed and unexpected inputs
Robustness
Fuzz testing
Attack surface analysis
Vulnerability chaining
Closed-box testing of known vulnerability scanning
Software composition analysis of binary executable files
Static and dynamic code analysis, including testing for credentials that are hardcoded, default, easily guessed, and easily compromised
Penetration testing

Each of those bullets generates an evidence expectation. If your pen test report only addresses the last one, reviewers will ask about the other nine.

What's actually inside a medical device pen test

A pen test scoped for the 2026 guidance is not a one-week web app assessment. It is a multi-discipline engagement against a physical or software device, its companion mobile app, its cloud backend, its wireless interfaces, and its update channels. Done right, the engagement itself produces evidence for most of the testing bullets above, not just the "penetration testing" line item.

1. Attack surface analysis (as a written artifact)

The first phase enumerates every interface that can receive input: network ports, BLE, Wi-Fi, proprietary RF, USB, serial, JTAG/SWD, web UIs, REST and gRPC APIs, mobile app endpoints, OTA update channels, and any cloud-to-device or device-to-cloud paths. This is not just internal recon - it is delivered as a written attack surface analysis tied to the threat model, which is what the FDA wants as the standalone "attack surface analysis" deliverable.

2. Abuse and misuse case testing

The threat model defines abuse cases (an attacker actively trying to harm the patient or compromise data) and misuse cases (a clinician or patient using the device in foreseeable but unintended ways). The pen tester executes those cases against the built device - bypassing intended workflows, supplying malformed or out-of-sequence inputs, triggering safety interlocks, and confirming whether the device fails safely or fails dangerously.

3. Robustness testing

Robustness here means the device's ability to maintain safe behavior under abnormal but plausible conditions: dropped network connections, malformed packets, power instability, interrupted firmware updates, sensor noise, and out-of-order protocol messages. Robustness testing is distinct from fuzzing - it targets operational stress, not input-parser bugs - but both belong in the same engagement.

4. Fuzz testing (targeted, engagement-scoped)

The pen tester runs targeted fuzzing against exposed interfaces: BLE GATT services, network protocol parsers, file format handlers, cloud APIs, and any custom binary protocols. The goal is to find crashes, hangs, memory corruption, and unsafe state transitions that conventional test cases miss. This is engagement-scoped fuzzing - not the continuous CI fuzzing that the dev team should also be running (more on that below).

5. Closed-box known-vulnerability scanning

Automated scanning of the device, its exposed services, and any web/cloud surface for known CVEs and misconfigurations. Closed-box means without access to source code or internal documentation - the tester sees what an external attacker sees. This generates the closed-box scanning evidence the guidance asks for.

6. Vulnerability identification and exploitation

The defining activity of pen testing. Manual analysis, protocol reverse engineering, traffic inspection, firmware extraction, and active exploitation. Findings are not theoretical - they are proven with reproducible proof-of-concept evidence and CVSS scoring.

7. Vulnerability chaining

The guidance lists this separately, but in practice it is performed by the pen tester. Chaining combines low- and medium-severity findings into high-impact attack paths - for example, an information disclosure plus a weak session token plus an unauthenticated API call becomes full remote control of therapy. Reviewers specifically look for chaining analysis because individual CVSS scores understate real-world risk.

8. Post-exploitation

What an attacker could actually do after a successful compromise: persistence on the device, lateral movement from the device to the cloud backend or to other devices on the same network, data exfiltration, denial of therapy, and integrity attacks on logged clinical data.

9. Hardware, RF, wireless, mobile, and cloud coverage

For SiMD (Software in a Medical Device), this includes JTAG/SWD debug access, chip-off and glitching attacks, firmware extraction from flash, Secure Boot bypass attempts, BLE pairing and authentication attacks, Wi-Fi configuration weaknesses, proprietary RF replay and spoofing, mobile app reverse engineering, certificate pinning bypass, and full cloud API testing including IDOR, token handling, and authentication boundaries.

10. The reviewer-format report

The deliverable that the FDA actually reads. It must contain the five elements explicitly called out in the guidance: independence and technical expertise of testers, scope of testing, duration of testing, testing methods employed, and test results, findings, and observations. We cover the report format in detail further down.

What is NOT in a pen test (and why)

Being honest about the boundary is part of the value. The following activities are required by the 2026 guidance but sit outside the pen test engagement. They are continuous, code-side, or CI-integrated activities performed by the development team, and they need their own evidence:

SAST (static application source code analysis)

Source-code-level scanning for vulnerabilities, dangerous APIs, insecure patterns, and hardcoded credentials. SAST requires source access, runs on every commit in CI, and is owned by the development team. A pen tester operating in a closed-box engagement cannot produce SAST evidence.

DAST (dynamic application security testing)

Automated runtime scanning of running applications, typically integrated into staging environments and CI pipelines. DAST overlaps in spirit with what pen testers do manually, but the FDA wants the automated, continuous, dev-owned evidence - not just an annual engagement snapshot.

SCA on binary executables (SBOM-driven)

Software composition analysis against binaries - extracting an SBOM, matching components to known CVEs, and tracking the exploitability status of each finding in a VEX document. This is ongoing post-build work tied to the SBOM and VEX program, not a one-time pen test activity.

Hardcoded credential scanning

The guidance specifically calls out testing for hardcoded, default, easily guessed, and easily compromised credentials. This is primarily a SAST output (greps and entropy checks against source) plus secrets-scanning hooks on the repository. A pen tester will surface any credentials they find during exploitation, but the systematic coverage is a dev-team CI activity.

Continuous fuzzing in CI

Different from the targeted fuzzing in the pen test. Continuous fuzzing runs against parsers, protocol handlers, and library boundaries on every build or nightly, accumulating coverage and corpus over months. The pen test fuzzing is targeted and time-boxed. Mature programs do both, and the FDA wants evidence of both.

The clean way to talk about this in your submission: the pen test report covers the engagement-scoped activities, and a separate "secure development testing evidence" section covers the continuous SAST, DAST, SCA, secrets scanning, and CI fuzzing outputs.

ANSI/ISA 62443-4-1 §9 - the referenced process standard

The guidance explicitly points to ANSI/ISA 62443-4-1 for vulnerability testing. The relevant section is 9, which defines the security verification and validation testing process. The short version of what 9 expects:

9.2 Security requirements testing - verifying that every security requirement has a corresponding test case and evidence.
9.3 Threat mitigation testing - verifying that each threat in the threat model has at least one test case that exercises the mitigation.
9.4 Vulnerability testing - the testing activities listed earlier in this post (abuse cases, fuzzing, attack surface, etc.).
9.5 Penetration testing - adversarial exploitation against the integrated product.

A bundled pen test engagement designed around 62443-4-1 §9 naturally produces traceability between the threat model, the test cases, and the test results. That traceability is what survives reviewer scrutiny.

The 5 required pen test report elements (and reviewer red flags)

See also: Breakthrough Device Designation and Cybersecurity, Medical Device Pen Testing: FDA vs EU MDR 2026, and Letter to File vs New 510(k).

The guidance explicitly requires that pen test reports include all five of the following. Missing or weak coverage of any one is a near-automatic deficiency letter.

1. Independence and technical expertise of testers

Reviewers want named testers, their credentials (OSCP, OSCE, GXPN, CRTO, hardware-specific certifications), and clear separation from the development team. Red flag: a report with no tester names, no credentials, or testers who are also listed as developers on the device.

2. Scope of testing

A precise enumeration of what was in scope and what was out of scope - and the rationale for any exclusions. Red flag: vague scope ("the device and its companion app") with no interface-level breakdown, or out-of-scope items that are obvious attack paths from the threat model.

3. Duration of testing

Actual tester-days, not calendar duration. Reviewers know what realistic durations look like for a given device complexity. Red flag: a complex connected device "pen tested" in three days. That signals checkbox testing.

4. Testing methods employed

The methodology (OWASP, PTES, NIST SP 800-115), the tools used, the test cases run, and the rationale for the chosen approach. Red flag: "industry-standard methodology" with no specifics.

5. Test results, findings, and observations

Findings with CVSS scores, reproduction steps, evidence (screenshots, packet captures, exploit code), remediation recommendations, and retest results after fixes. Red flag: findings without evidence, or no retest section.

What a reviewer-format report actually looks like

"Reviewer-format" means a report structured so the FDA cybersecurity reviewer can drop it straight into the eSTAR cybersecurity attachments and find each of the five required elements within thirty seconds. The structure matters as much as the content - a report with all the right information buried in the wrong places still triggers deficiencies.

Document structure that survives review

1  Executive Summary           Tester independence, scope, duration, headline results
2  Scope and Methodology       In-scope/out-of-scope interfaces, versions, environment, methods, tools
3  Testing Activities          One subsection per guidance bullet (attack surface, abuse cases, robustness, fuzz, closed-box scan)
4  Exploitation and Chaining   Findings table + attack path analysis
5  Findings Detail             Per-finding pages with CVSS, repro, evidence, recommendation
6  Retest Results              What was fixed, what was verified, residual risk
7  Traceability Matrix         Threat model entry → test case → result
A  Tester Bios and Credentials Named testers, certifications, independence statement

A reviewer can confirm all five required elements without leaving 1 and 2. The detail in 3-7 is there for the technical review that follows.

What each finding entry should contain

Every finding in 5 should be a self-contained page with:

Title and unique finding ID
CVSS v3.1 vector and score (base, temporal if relevant)
Affected component and version
Description of the vulnerability
Reproduction steps that a reviewer could follow
Evidence: screenshots, packet captures, exploit code, decompiled snippets
Impact tied to patient safety and the threat model entry it maps to
Recommended remediation
Retest result and date

Findings without CVSS scores, without reproduction steps, or without evidence are the most common deficiency trigger in this section.

Front-matter that answers the 5 elements immediately

The executive summary should explicitly call out, in this order:

Independence statement - "Testing was performed by [named testers] of [firm], who have no development relationship with [manufacturer]. Tester credentials are listed in Appendix A."
Scope summary - one paragraph naming every interface tested and every interface excluded with rationale.
Duration - actual tester-days, broken out by phase if multi-discipline.
Methodology - named frameworks (OWASP MASTG, PTES, NIST SP 800-115) and primary tools.
Results summary - finding counts by severity, retest status, and residual risk statement.

A reviewer reading only the first two pages should be able to check off all five required elements.

Anti-patterns that trigger deficiencies

Reports that fail review usually share one or more of these traits:

Marketing-style executive summary with vendor logos on every page and no scope detail
"Approximately two weeks" instead of actual tester-days
"Industry-standard methodology" with no named frameworks or tools
Findings without CVSS scores, without reproduction steps, or without evidence
No retest section - fixes are "recommended" but never verified
No named testers or credentials - just a firm name
Out-of-scope items that are obvious attack paths from the threat model (e.g., excluding BLE on a BLE-connected device)
Findings buried in a 60-page appendix with no severity table up front

How a bundled engagement prevents deficiency letters

The pattern is consistent across the deficiency letters we respond to: reviewers ask the manufacturer to produce evidence for one of the testing activities that wasn't covered. The fix at submission time is to anticipate the question.

A bundled pen test report explicitly maps each section back to the 2026 guidance bullet it satisfies:

Guidance bullet                              Report section
-----------------------------------------    ----------------------------------
Attack surface analysis                      3.1 Attack Surface Analysis
Abuse/misuse cases, malformed inputs         3.2 Abuse and Misuse Case Testing
Robustness                                   3.3 Robustness Testing
Fuzz testing                                 3.4 Targeted Fuzz Testing
Closed-box known-vuln scanning               3.5 Closed-Box Vulnerability Scan
Vulnerability identification                 4.1 Findings
Vulnerability chaining                       4.2 Attack Path Analysis
Penetration testing                          4 Exploitation Results
Required report elements (1-5)               1 Executive Summary, 2 Scope/Methods

Reviewers reading that table know immediately that the testing section of the submission is covered. The questions stop before they start.

How Blue Goat approaches this

Blue Goat Cyber's medical device practice is led by engineers with CISSP, OSCP, and prior military red-team backgrounds. We treat cybersecurity documentation as design-controlled engineering output, not a submission template, every artifact (threat model, SBOM, security risk assessment, penetration test, labeling) traces back to a controlled requirement and a verified result.

Our engagements deliver the full Feb 3, 2026 guidance documentation set scoped to the device's risk profile, integrated with the existing IEC 62304 software lifecycle and ISO 14971 risk file. See our medical device cybersecurity services for the full scope. If the FDA raises cybersecurity deficiencies after our submission, we resolve them at no additional cost.

FAQ

My current vendor's pen test didn't include fuzz testing - is that enough for the FDA?

No. The guidance lists fuzz testing as a distinct required activity. You need engagement-scoped fuzz testing in the pen test plus, ideally, continuous fuzzing evidence from CI. A pen test that doesn't address fuzzing leaves a known gap reviewers will flag.

Do I still need SAST and DAST if the pen test covers everything else?

Yes. SAST and DAST are separate guidance bullets, performed by the dev team on every build. The pen test cannot replace them and is not intended to. Submit both.

Is robustness testing the same as fuzz testing?

No. Robustness targets operational stress (dropped connections, power instability, malformed packets, environmental conditions). Fuzz testing targets parser and protocol bugs from malformed input. They cover different failure classes and the guidance lists them separately.

Who counts as an "independent" tester?

A tester who is not on the development team, does not report to the development organization, and has no conflict of interest in the findings. Internal security teams can qualify if organizationally separate, but most reviewers prefer external third parties for the primary pen test.

How long should a medical device pen test take?

For a typical connected SiMD with a mobile app and cloud backend, two to four weeks of active testing is realistic. SaMD-only with a single cloud surface can be shorter. Hardware-intensive devices with custom RF or implantable components run longer. A three-day "pen test" on a complex connected device is a red flag.

Does the 2026 guidance change what pen testing looks like vs the 2023 guidance?

The bullet list of required testing activities is consistent, but the 2026 final guidance ties pen testing more tightly to the threat model and to Section 524B's "reasonable assurance of cybersecurity" standard. The bar for evidence and traceability is higher than under the 2023 draft.

Final thoughts

The shortest path through a FDA cybersecurity review is a pen test engagement that produces evidence for as many of the ten testing requirements as one engagement can cover, plus a clear handoff to the dev team's CI-side evidence for the rest. Narrow pen tests don't fail because they're bad - they fail because they leave the manufacturer holding nine other evidence gaps the reviewer is going to ask about.

If you want a pen test engagement scoped against the Feb 3, 2026 guidance bullets - with the attack surface analysis, abuse case testing, fuzz testing, robustness testing, closed-box scanning, exploitation, chaining, and a reviewer-format report all delivered as one package - contact us and we'll scope it.

Need a written gap check against these requirements? Our Medical Device Pen Test Requirements gap check returns a one-business-day written analysis against Section 524B, the Feb 2026 guidance, and AAMI TIR57 / ANSI/AAMI SW96:2023 - free.

Continue the Medical device penetration testing series

Dive deeper with these companion articles:

About the author

Christian Espinosa, Founder & CEO at Blue Goat Cyber

Christian Espinosa, MBA, CISSP · Founder & CEO, Blue Goat Cyber

U.S. Air Force Academy graduate and veteran with 30+ years in cybersecurity. Founded Alpine Security in 2014 (acquired 2020), then Blue Goat Cyber in 2022. Has supported 250+ FDA medical device submissions; no client has failed to clear due to cybersecurity. Author of three books including The Smartest Person in the Room. Ironman triathlete and mountaineer.

Keep reading

Keep going: the 524B and eSTAR working set

Start with the walkthrough hub, then drill into the statute, the eSTAR field map, SBOM monitoring, postmarket planning, and deficiency response. Use these as the playbook behind every cyber device submission.

Hub

FDA Section 524B & eSTAR Cybersecurity Walkthrough

Start here: the hub that ties the statute, the February 2026 guidance, and the eSTAR fields together in the order a submission team works through them.

Related services

Put this into practice on your device

Every Blue Goat Cyber engagement maps directly to FDA Section 524B and the SPDF - so the evidence you need lands in your submission, not in a separate report.

Ready when you are

Get FDA cleared without the cybersecurity headaches.

30-minute strategy session. No cost, no commitment - just answers from people who've shipped 250+ FDA submissions.

Book strategy session Explore services