Blue Goat CyberSMMedical Device Cybersecurity
    K
    Blog · Penetration Testing

    Penetration Test Case Design for Medical Devices

    How to design penetration test cases from a medical device threat model — the methodology that bridges STRIDE-style threats and concrete bench test execution, with traceability the FDA expects in Slot 7.

    Hero illustration for the Penetration Testing article: Penetration Test Case Design for Medical Devices
    Christian Espinosa, Founder & CEO at Blue Goat Cyber

    By Christian Espinosa, MBA, CISSP

    Founder & CEO · Blue Goat Cyber

    Published: June 11, 2026

    Published June 11, 2026

    Direct answer

    Penetration test case design for medical devices is the methodology that bridges the threat model and the bench. The FDA's February 3, 2026 final premarket cybersecurity guidance expects every penetration test result in eSTAR v7.0 Slot 7 to trace back to a specific threat in Slot 3 — which requires explicit test cases, not just exploratory effort. A defensible test case has six elements: threat ID, hypothesis, preconditions, procedure, expected vs actual result, and disposition. Reviewers credit traceability; they discount narrative.

    Key Takeaways

    • Pen test case design is the bridge between scope (which interfaces are in play) and test execution.
    • Every test case maps to one or more threats in the Slot 3 threat model.
    • A defensible case has 6 elements: threat ID, hypothesis, preconditions, procedure, expected/actual, disposition.
    • Test cases should cover positive, negative, abuse, edge-condition, and chained-attack scenarios.
    • The traceability matrix between threats and test cases is itself a Slot 7 deliverable.

    Table of Contents

    Why this matters

    Two pen-test reports can have identical scopes and identical findings counts and very different outcomes at FDA review. The difference is almost always traceability. A report organized by URL or by tool output forces the reviewer to do the cross-referencing themselves. A report organized by test case — each one mapped back to a Slot 3 threat — lets the reviewer credit coverage in one read.

    FDA language

    "Penetration testing should be scoped to the threat model and architecture views, with results documented in a manner that supports traceability between identified threats, tests performed, and findings reported."

    Test case design is also where the most expensive engagement waste happens. A pen test that runs without case design tends to over-cover the easy interfaces (web/API) and under-cover the hard ones (BLE, USB, JTAG, firmware update). Case design forces an early conversation about coverage proportional to risk.

    The 6-element test case structure

    Every test case in a defensible pen-test plan contains:

    1. Threat ID(s) — the Slot 3 threat(s) the case exercises. A case can exercise multiple threats; a threat usually needs multiple cases.
    2. Hypothesis — one sentence stating what the test will attempt to prove or disprove. ("An unauthenticated attacker on the local network can pair a malicious mobile device to the implant via BLE and inject a therapy command.")
    3. Preconditions — device state, network state, account state, hardware setup. Documented so the test is reproducible.
    4. Procedure — the steps. Tool-specific where useful (Burp scope, Bettercap profile, HackRF settings) but written so another tester could execute.
    5. Expected vs actual result — the expected outcome from the control's design intent, and the actual outcome observed.
    6. Disposition — pass / fail / partial / blocked, with finding ID where applicable and remediation pointer.

    This structure is what makes Slot 7 traceable and what reviewers credit.

    How to derive cases from a STRIDE threat model

    The threat model in Slot 3 names the threats. The test plan in Slot 7 has to exercise them. The cleanest derivation we have found is to walk each threat through three questions:

    1. Confirm the threat is reachable. What's the simplest case that proves an attacker can get to the attack surface where this threat lives?
    2. Confirm the control works. What's the case that exercises the control under its design conditions and verifies expected behavior?
    3. Stress the control. What's the case that probes the edges — wrong input format, exhausted resources, timing manipulation, partial failure?

    For example, a STRIDE-Tampering threat on a BLE firmware update flow yields, at minimum, these cases:

    • Reachability. Confirm an attacker within BLE range can connect to the update characteristic without authentication. (Hypothesis: connection is not gated.)
    • Control under design conditions. Submit a validly-signed update; verify it is accepted, applied, and logged.
    • Control under stress. Submit an update with a tampered payload but valid signature header; submit one with an older valid signature; submit during low-battery; interrupt mid-transfer; submit with the signing key from a sibling product.

    Coverage stops when the threat is either credibly mitigated, credibly demonstrated as a finding, or credibly documented as out of scope with rationale.

    For depth on the threat model side see our STRIDE threat modeling guide and 12 critical threat modeling gaps.

    Coverage axes every test plan needs

    A pen-test plan that misses any of these axes invites a deficiency. The plan should explicitly state coverage and rationale on each.

    See also: DAST vs Penetration Testing: What the FDA Requires, Does the FDA Accept AI Pen Testing for Medical Devices?, and Docker Containers in Medical Devices: What the FDA Expects You to Test.

    Axis What to cover Common omission
    Interface Network, BLE, Wi-Fi, cellular, NFC, USB, debug ports, firmware update Hardware interfaces
    Role Unauthenticated, patient, clinician, service tech, manufacturer, machine-to-machine Service / vendor role
    Authentication state Pre-auth, authenticated, authenticated-but-expired-session Expired-session paths
    Trust boundary Each boundary identified in Slot 3 architecture views Cloud-to-device boundary
    Protocol Web, REST/GraphQL, HL7, FHIR, DICOM, BLE GATT, ASTM, USB CDC/Mass-Storage Medical protocols
    Failure mode Network drop, power loss, partial input, exhausted resources Resource exhaustion
    Chained attacks Multi-step paths combining low-severity findings into a clinically-relevant outcome Almost always missed

    Chained attacks are the single most under-tested axis. Reviewers and clinicians both care about end-to-end exploit paths, not isolated findings.

    Common deficiency patterns

    1. Findings without test cases. The pen test report lists vulnerabilities but no test cases or threat traceability. "Provide the test plan and the mapping between threats in the threat model and tests performed."
    2. Threats without test cases. Slot 3 has a threat that no Slot 7 case exercises. "Provide test evidence for [threat ID]."
    3. No negative cases. Every case is positive; the test plan never tries to break a control. Reviewers ask for "evidence that controls were exercised under adverse conditions."
    4. No chained-attack cases. Findings stand alone; no case demonstrates the clinically-relevant end-to-end path.

    How Blue Goat Cyber designs test cases

    Our test case design starts with your Slot 3 threat model and walks every threat through reachability, design-conditions, and stress cases. We add chained-attack cases for any combination of findings that would alter the patient-safety conclusion. Cases are documented in the 6-element structure before execution begins, so the bench team is executing a plan, not improvising. The traceability matrix from Slot 3 → Slot 7 is built incrementally as cases close. The result is a Slot 7 package the reviewer can credit on first read.

    If the FDA raises cybersecurity deficiencies after our submission, we resolve them at no additional cost. See our medical device penetration testing service and the companion posts on scoping a medical device penetration test and DAST vs penetration testing.

    FAQ

    How many test cases should a medical device pen test have?

    It depends on the threat model. For a typical connected device with ~30–60 threats in Slot 3, expect 100–300 cases — most threats need 3–5 cases to cover reachability, design conditions, and stress. Devices with significant hardware or radio surface will trend higher.

    Should test cases be written before the engagement starts?

    Yes for the core coverage cases derived from the threat model. The pen test will inevitably generate additional cases mid-engagement as testers discover new attack surface. Both planned and discovered cases belong in the final report.

    Who writes the test cases — the manufacturer or the pen-test vendor?

    The pen-test vendor writes them, but they should be reviewed by the manufacturer's cybersecurity and clinical leads before execution. Clinical input is what catches the chained-attack cases that matter to patient safety.

    How is this different from scoping the pen test?

    Scoping defines the boundaries — which interfaces, which environment, which build, which timeframe. Case design defines the content within the scope — which threats get exercised and how. Both are required.

    Can the same test case be used across multiple devices?

    Coverage templates can be reused (e.g., "BLE pairing under MITM" applies to any BLE device). Specific cases — with preconditions and procedures tied to the device — should be written per device because the threat model is per device.

    Where does the traceability matrix live?

    In eSTAR v7.0 Slot 7 as an appendix to the test report, with one row per threat ID and columns for the test case(s), finding(s), and residual-risk argument. Most accepted submissions format it as a spreadsheet attachment.

    Ready to design a defensible test plan?

    If you have a threat model in hand and need a pen-test plan with traceable test cases, we will build it scoped to your threat model and execute on a calibrated bench. Request a scoping call.


    Christian Espinosa — Founder, Blue Goat Cyber. CISSP, ex-military red team. Has designed pen-test cases for more than 250 FDA-submitted medical devices. More on the author.

    Ready when you are

    Get FDA cleared without the cybersecurity headaches.

    30-minute strategy session. No cost, no commitment - just answers from people who've shipped 250+ FDA submissions.