
Published: June 2, 2026 · Last reviewed: May 1, 2026
Published June 2, 2026
The FDA's February 3, 2026 premarket cybersecurity guidance does not treat "a penetration test" as a single deliverable. It lists penetration testing alongside nine other testing and analysis activities — abuse/misuse cases, robustness, fuzz testing, attack surface analysis, vulnerability chaining, closed-box known-vulnerability scanning, software composition analysis of binaries, and static/dynamic code analysis (including hardcoded credential checks). A medical device pen test that only delivers exploitation findings satisfies maybe one of those ten requirements. The other nine show up later as deficiency letter questions. The fix is a bundled pen test engagement that produces the attack surface artifact, executes the abuse/misuse cases, runs targeted fuzz and robustness testing, performs closed-box scanning, exploits and chains findings, and delivers a report with the five FDA-required elements — independence, scope, duration, methods, and results. Code-side activities (SAST, DAST, SBOM-driven SCA, hardcoded credential scanning, continuous fuzzing in CI) sit outside the pen test and remain the dev team's responsibility.
Key takeaways
- The FDA lists ten distinct testing activities. Penetration testing is one of them, not all of them.
- A narrow "exploit and report" pen test leaves nine other evidence gaps for reviewers to flag.
- Blue Goat's pen test engagement bundles attack surface analysis, abuse/misuse testing, fuzz testing, robustness testing, closed-box scanning, exploitation, and chaining into one evidence package.
- SAST, DAST, SBOM-driven SCA, hardcoded credential scanning, and continuous fuzzing in CI are separate dev-team activities and stay separate. The FDA wants evidence of both.
- The pen test report must include the five required elements: tester independence and expertise, scope, duration, methods, and results.
- ANSI/ISA 62443-4-1 §9 is the referenced process standard for vulnerability testing. The bundled engagement is structured to satisfy it.
The misconception that triggers deficiency letters
The most common pattern we see in deficiency letters: the manufacturer submits a clean pen test report, and the FDA reviewer comes back asking for fuzz testing evidence, abuse case testing, an attack surface analysis, or vulnerability chaining results. The manufacturer assumed "we did a pen test" covered all of it. It didn't.
The 2026 final guidance is explicit. Under the testing section, it lists penetration testing as one bullet inside a longer list of required activities:
- Vulnerability testing (as described in ANSI/ISA 62443-4-1)
- Abuse or misuse cases, malformed and unexpected inputs
- Robustness
- Fuzz testing
- Attack surface analysis
- Vulnerability chaining
- Closed-box testing of known vulnerability scanning
- Software composition analysis of binary executable files
- Static and dynamic code analysis, including testing for credentials that are hardcoded, default, easily guessed, and easily compromised
- Penetration testing
Each of those bullets generates an evidence expectation. If your pen test report only addresses the last one, reviewers will ask about the other nine.
What's actually inside a medical device pen test
A pen test scoped for the 2026 guidance is not a one-week web app assessment. It is a multi-discipline engagement against a physical or software device, its companion mobile app, its cloud backend, its wireless interfaces, and its update channels. Done right, the engagement itself produces evidence for most of the testing bullets above, not just the "penetration testing" line item.
1. Attack surface analysis (as a written artifact)
The first phase enumerates every interface that can receive input: network ports, BLE, Wi-Fi, proprietary RF, USB, serial, JTAG/SWD, web UIs, REST and gRPC APIs, mobile app endpoints, OTA update channels, and any cloud-to-device or device-to-cloud paths. This is not just internal recon — it is delivered as a written attack surface analysis tied to the threat model, which is what the FDA wants as the standalone "attack surface analysis" deliverable.
2. Abuse and misuse case testing
The threat model defines abuse cases (an attacker actively trying to harm the patient or compromise data) and misuse cases (a clinician or patient using the device in foreseeable but unintended ways). The pen tester executes those cases against the built device — bypassing intended workflows, supplying malformed or out-of-sequence inputs, triggering safety interlocks, and confirming whether the device fails safely or fails dangerously.
3. Robustness testing
Robustness here means the device's ability to maintain safe behavior under abnormal but plausible conditions: dropped network connections, malformed packets, power instability, interrupted firmware updates, sensor noise, and out-of-order protocol messages. Robustness testing is distinct from fuzzing — it targets operational stress, not input-parser bugs — but both belong in the same engagement.
4. Fuzz testing (targeted, engagement-scoped)
The pen tester runs targeted fuzzing against exposed interfaces: BLE GATT services, network protocol parsers, file format handlers, cloud APIs, and any custom binary protocols. The goal is to find crashes, hangs, memory corruption, and unsafe state transitions that conventional test cases miss. This is engagement-scoped fuzzing — not the continuous CI fuzzing that the dev team should also be running (more on that below).
5. Closed-box known-vulnerability scanning
Automated scanning of the device, its exposed services, and any web/cloud surface for known CVEs and misconfigurations. Closed-box means without access to source code or internal documentation — the tester sees what an external attacker sees. This generates the closed-box scanning evidence the guidance asks for.
6. Vulnerability identification and exploitation
The defining activity of pen testing. Manual analysis, protocol reverse engineering, traffic inspection, firmware extraction, and active exploitation. Findings are not theoretical — they are proven with reproducible proof-of-concept evidence and CVSS scoring.
7. Vulnerability chaining
The guidance lists this separately, but in practice it is performed by the pen tester. Chaining combines low- and medium-severity findings into high-impact attack paths — for example, an information disclosure plus a weak session token plus an unauthenticated API call becomes full remote control of therapy. Reviewers specifically look for chaining analysis because individual CVSS scores understate real-world risk.
8. Post-exploitation
What an attacker could actually do after a successful compromise: persistence on the device, lateral movement from the device to the cloud backend or to other devices on the same network, data exfiltration, denial of therapy, and integrity attacks on logged clinical data.
9. Hardware, RF, wireless, mobile, and cloud coverage
For SiMD (Software in a Medical Device), this includes JTAG/SWD debug access, chip-off and glitching attacks, firmware extraction from flash, Secure Boot bypass attempts, BLE pairing and authentication attacks, Wi-Fi configuration weaknesses, proprietary RF replay and spoofing, mobile app reverse engineering, certificate pinning bypass, and full cloud API testing including IDOR, token handling, and authentication boundaries.
10. The reviewer-format report
The deliverable that the FDA actually reads. It must contain the five elements explicitly called out in the guidance: independence and technical expertise of testers, scope of testing, duration of testing, testing methods employed, and test results, findings, and observations. We cover the report format in detail further down.
What is NOT in a pen test (and why)
Being honest about the boundary is part of the value. The following activities are required by the 2026 guidance but sit outside the pen test engagement. They are continuous, code-side, or CI-integrated activities performed by the development team, and they need their own evidence:
SAST (static application source code analysis)
Source-code-level scanning for vulnerabilities, dangerous APIs, insecure patterns, and hardcoded credentials. SAST requires source access, runs on every commit in CI, and is owned by the development team. A pen tester operating in a closed-box engagement cannot produce SAST evidence.
DAST (dynamic application security testing)
Automated runtime scanning of running applications, typically integrated into staging environments and CI pipelines. DAST overlaps in spirit with what pen testers do manually, but the FDA wants the automated, continuous, dev-owned evidence — not just an annual engagement snapshot.
SCA on binary executables (SBOM-driven)
Software composition analysis against binaries — extracting an SBOM, matching components to known CVEs, and tracking the exploitability status of each finding in a VEX document. This is ongoing post-build work tied to the SBOM and VEX program, not a one-time pen test activity.
Hardcoded credential scanning
The guidance specifically calls out testing for hardcoded, default, easily guessed, and easily compromised credentials. This is primarily a SAST output (greps and entropy checks against source) plus secrets-scanning hooks on the repository. A pen tester will surface any credentials they find during exploitation, but the systematic coverage is a dev-team CI activity.
Continuous fuzzing in CI
Different from the targeted fuzzing in the pen test. Continuous fuzzing runs against parsers, protocol handlers, and library boundaries on every build or nightly, accumulating coverage and corpus over months. The pen test fuzzing is targeted and time-boxed. Mature programs do both, and the FDA wants evidence of both.
The clean way to talk about this in your submission: the pen test report covers the engagement-scoped activities, and a separate "secure development testing evidence" section covers the continuous SAST, DAST, SCA, secrets scanning, and CI fuzzing outputs.
ANSI/ISA 62443-4-1 §9 — the referenced process standard
The guidance explicitly points to ANSI/ISA 62443-4-1 for vulnerability testing. The relevant section is §9, which defines the security verification and validation testing process. The short version of what §9 expects:
- §9.2 Security requirements testing — verifying that every security requirement has a corresponding test case and evidence.
- §9.3 Threat mitigation testing — verifying that each threat in the threat model has at least one test case that exercises the mitigation.
- §9.4 Vulnerability testing — the testing activities listed earlier in this post (abuse cases, fuzzing, attack surface, etc.).
- §9.5 Penetration testing — adversarial exploitation against the integrated product.
A bundled pen test engagement designed around 62443-4-1 §9 naturally produces traceability between the threat model, the test cases, and the test results. That traceability is what survives reviewer scrutiny.
The 5 required pen test report elements (and reviewer red flags)
The guidance explicitly requires that pen test reports include all five of the following. Missing or weak coverage of any one is a near-automatic deficiency letter.
1. Independence and technical expertise of testers
Reviewers want named testers, their credentials (OSCP, OSCE, GXPN, CRTO, hardware-specific certifications), and clear separation from the development team. Red flag: a report with no tester names, no credentials, or testers who are also listed as developers on the device.
2. Scope of testing
A precise enumeration of what was in scope and what was out of scope — and the rationale for any exclusions. Red flag: vague scope ("the device and its companion app") with no interface-level breakdown, or out-of-scope items that are obvious attack paths from the threat model.
3. Duration of testing
Actual tester-days, not calendar duration. Reviewers know what realistic durations look like for a given device complexity. Red flag: a complex connected device "pen tested" in three days. That signals checkbox testing.
4. Testing methods employed
The methodology (OWASP, PTES, NIST SP 800-115), the tools used, the test cases run, and the rationale for the chosen approach. Red flag: "industry-standard methodology" with no specifics.
5. Test results, findings, and observations
Findings with CVSS scores, reproduction steps, evidence (screenshots, packet captures, exploit code), remediation recommendations, and retest results after fixes. Red flag: findings without evidence, or no retest section.
What a reviewer-format report actually looks like
"Reviewer-format" means a report structured so the FDA cybersecurity reviewer can drop it straight into the eSTAR cybersecurity attachments and find each of the five required elements within thirty seconds. The structure matters as much as the content — a report with all the right information buried in the wrong places still triggers deficiencies.
Document structure that survives review
§1 Executive Summary Tester independence, scope, duration, headline results
§2 Scope and Methodology In-scope/out-of-scope interfaces, versions, environment, methods, tools
§3 Testing Activities One subsection per guidance bullet (attack surface, abuse cases, robustness, fuzz, closed-box scan)
§4 Exploitation and Chaining Findings table + attack path analysis
§5 Findings Detail Per-finding pages with CVSS, repro, evidence, recommendation
§6 Retest Results What was fixed, what was verified, residual risk
§7 Traceability Matrix Threat model entry → test case → result
§A Tester Bios and Credentials Named testers, certifications, independence statement
A reviewer can confirm all five required elements without leaving §1 and §2. The detail in §3–§7 is there for the technical review that follows.
What each finding entry should contain
Every finding in §5 should be a self-contained page with:
- Title and unique finding ID
- CVSS v3.1 vector and score (base, temporal if relevant)
- Affected component and version
- Description of the vulnerability
- Reproduction steps that a reviewer could follow
- Evidence: screenshots, packet captures, exploit code, decompiled snippets
- Impact tied to patient safety and the threat model entry it maps to
- Recommended remediation
- Retest result and date
Findings without CVSS scores, without reproduction steps, or without evidence are the most common deficiency trigger in this section.
Front-matter that answers the 5 elements immediately
The executive summary should explicitly call out, in this order:
- Independence statement — "Testing was performed by [named testers] of [firm], who have no development relationship with [manufacturer]. Tester credentials are listed in Appendix A."
- Scope summary — one paragraph naming every interface tested and every interface excluded with rationale.
- Duration — actual tester-days, broken out by phase if multi-discipline.
- Methodology — named frameworks (OWASP MASTG, PTES, NIST SP 800-115) and primary tools.
- Results summary — finding counts by severity, retest status, and residual risk statement.
A reviewer reading only the first two pages should be able to check off all five required elements.
Anti-patterns that trigger deficiencies
Reports that fail review usually share one or more of these traits:
- Marketing-style executive summary with vendor logos on every page and no scope detail
- "Approximately two weeks" instead of actual tester-days
- "Industry-standard methodology" with no named frameworks or tools
- Findings without CVSS scores, without reproduction steps, or without evidence
- No retest section — fixes are "recommended" but never verified
- No named testers or credentials — just a firm name
- Out-of-scope items that are obvious attack paths from the threat model (e.g., excluding BLE on a BLE-connected device)
- Findings buried in a 60-page appendix with no severity table up front
How a bundled engagement prevents deficiency letters
The pattern is consistent across the deficiency letters we respond to: reviewers ask the manufacturer to produce evidence for one of the testing activities that wasn't covered. The fix at submission time is to anticipate the question.
A bundled pen test report explicitly maps each section back to the 2026 guidance bullet it satisfies:
Guidance bullet Report section
----------------------------------------- ----------------------------------
Attack surface analysis §3.1 Attack Surface Analysis
Abuse/misuse cases, malformed inputs §3.2 Abuse and Misuse Case Testing
Robustness §3.3 Robustness Testing
Fuzz testing §3.4 Targeted Fuzz Testing
Closed-box known-vuln scanning §3.5 Closed-Box Vulnerability Scan
Vulnerability identification §4.1 Findings
Vulnerability chaining §4.2 Attack Path Analysis
Penetration testing §4 Exploitation Results
Required report elements (1-5) §1 Executive Summary, §2 Scope/Methods
Reviewers reading that table know immediately that the testing section of the submission is covered. The questions stop before they start.
FAQs
My current vendor's pen test didn't include fuzz testing — is that enough for the FDA?
No. The guidance lists fuzz testing as a distinct required activity. You need engagement-scoped fuzz testing in the pen test plus, ideally, continuous fuzzing evidence from CI. A pen test that doesn't address fuzzing leaves a known gap reviewers will flag.
Do I still need SAST and DAST if the pen test covers everything else?
Yes. SAST and DAST are separate guidance bullets, performed by the dev team on every build. The pen test cannot replace them and is not intended to. Submit both.
Is robustness testing the same as fuzz testing?
No. Robustness targets operational stress (dropped connections, power instability, malformed packets, environmental conditions). Fuzz testing targets parser and protocol bugs from malformed input. They cover different failure classes and the guidance lists them separately.
Who counts as an "independent" tester?
A tester who is not on the development team, does not report to the development organization, and has no conflict of interest in the findings. Internal security teams can qualify if organizationally separate, but most reviewers prefer external third parties for the primary pen test.
How long should a medical device pen test take?
For a typical connected SiMD with a mobile app and cloud backend, two to four weeks of active testing is realistic. SaMD-only with a single cloud surface can be shorter. Hardware-intensive devices with custom RF or implantable components run longer. A three-day "pen test" on a complex connected device is a red flag.
Does the 2026 guidance change what pen testing looks like vs the 2023 guidance?
The bullet list of required testing activities is consistent, but the 2026 final guidance ties pen testing more tightly to the threat model and to Section 524B's "reasonable assurance of cybersecurity" standard. The bar for evidence and traceability is higher than under the 2023 draft.
Final thoughts
The shortest path through a FDA cybersecurity review is a pen test engagement that produces evidence for as many of the ten testing requirements as one engagement can cover, plus a clear handoff to the dev team's CI-side evidence for the rest. Narrow pen tests don't fail because they're bad — they fail because they leave the manufacturer holding nine other evidence gaps the reviewer is going to ask about.
If you want a pen test engagement scoped against the Feb 3, 2026 guidance bullets — with the attack surface analysis, abuse case testing, fuzz testing, robustness testing, closed-box scanning, exploitation, chaining, and a reviewer-format report all delivered as one package — contact us and we'll scope it.
Need a written gap check against these requirements? Our Medical Device Pen Test Requirements gap check returns a one-business-day written analysis against Section 524B, the Feb 2026 guidance, and AAMI TIR57 — free.
Related reading: Special vs Traditional 510(k) for cybersecurity changes, Letter to file vs new 510(k) for cybersecurity changes, Fuzz testing in medical device cybersecurity, Closed-box testing in medical device cybersecurity, 12 critical findings from medical device pen tests, and our pen test service page at Medical Device Penetration Testing Services.