GMLP Crosswalk: 10 Principles to Engineering Controls…

By Christian Espinosa, MBA, CISSP

Founder & CEO · Blue Goat Cyber

Reviewed by Trevor Slattery

COO · Blue Goat Cyber

Last reviewed: May 1, 2026

Reference Guide · Updated 2026 · 7 min read

The 10 Good Machine Learning Practice (GMLP) guiding principles were jointly issued by FDA, Health Canada, and MHRA in October 2021. They are short, broadly worded, and easy to nod along to in a slide deck. The hard part is operationalizing them inside an actual QMS and engineering pipeline. This guide maps each principle to concrete controls.

Talk to an AI/ML expert · AI/ML Medical Device Security Service →

How to use this crosswalk

For each principle, we list (a) what the principle says in plain English, (b) the engineering controls that satisfy it, and (c) the documentation reviewers expect to see. Map this against your existing QMS to find the gaps.

1. Multi-disciplinary expertise applied throughout the lifecycle

What it means: Software, clinical, regulatory, human factors, cybersecurity, and data-science expertise are involved from concept through postmarket.

Engineering controls: Cross-functional design reviews at each gate; documented role assignments; clinical and human-factors representation on change-control board.

Documentation: DHF entries showing multi-disciplinary review, RACI for the AI program, training records.

2. Good software engineering and security practices

What it means: Secure SDLC applies to AI components, including training pipelines and data infrastructure.

Engineering controls: Version control for code, models, datasets, and configs; CI/CD with automated testing; secrets management; SAST/DAST; reproducible builds; signed model artifacts; supply-chain controls for third-party models and dependencies.

Documentation: Secure SDLC procedure, SBOM (including ML-BOM with model and dataset provenance), threat model, pen test report.

3. Clinical study participants and data sets are representative of the intended patient population

What it means: Training and validation data reflect the demographics, settings, and clinical realities the device will see.

Engineering controls: Documented data-sourcing strategy; representativeness analysis (age, sex, race/ethnicity, comorbidities, sites, devices) before training; gap-fill plan when subgroups are under-represented.

Documentation: Data management plan, dataset cards, representativeness report, subgroup-coverage matrix.

4. Training data sets are independent of test sets

What it means: No leakage between train, validation, and test sets - by patient, by site, by time.

Engineering controls: Patient-level (not record-level) splits; temporal hold-outs for time-sensitive models; site-level hold-outs for generalization claims; automated leakage checks in the training pipeline.

Documentation: Split methodology in the data management plan; leakage-check results in validation report.

5. Selected reference datasets are based upon best available methods

What it means: Ground truth labels are produced by a defensible, reproducible process.

Engineering controls: Multi-reader labeling with adjudication; documented label schema and edge cases; inter-rater agreement metrics; periodic label audits.

Documentation: Labeling SOP, reader credentials, agreement metrics, audit logs.

6. Model design is tailored to the available data and reflects the intended use of the device

What it means: The architecture and training approach fit the data volume, signal characteristics, and clinical question - not just whatever is fashionable.

Engineering controls: Documented architecture-selection rationale; complexity justified by data volume and noise; explicit handling of class imbalance, missing data, and uncertainty.

Documentation: Model design rationale section in the submission; experiment log showing alternatives considered.

7. Focus is placed on the performance of the human-AI team

What it means: The model is not evaluated in isolation - it is evaluated as part of the human workflow it supports.

Engineering controls: Human-factors evaluation; reader studies that compare unaided vs AI-assisted performance; automation-bias and over-reliance assessment.

Documentation: Human-factors report, reader-study results, IFU language addressing appropriate reliance and disagreement handling.

8. Testing demonstrates device performance during clinically relevant conditions

What it means: Performance is shown across the conditions the device will encounter, including subgroups and edge cases.

Engineering controls: Stratified evaluation across demographics, sites, scanner/sensor vendors, and clinical contexts; failure-mode characterization on edge cases; robustness testing.

Documentation: Validation report with subgroup tables and confidence intervals; edge-case test catalog.

9. Users are provided clear, essential information

What it means: Transparency to users about what the model does, how it performs, its inputs and limitations.

Engineering controls: Model-card-style summary in labeling; subgroup performance disclosed where clinically relevant; explicit statement of intended use, environment, and limitations.

Documentation: Transparency labeling, IFU performance section, user-facing release notes for new versions.

10. Deployed models are monitored for performance and re-training risks are managed

What it means: Ongoing performance, drift, and bias monitoring with documented thresholds and responses; retraining is controlled.

Engineering controls: Telemetry for input distribution, output distribution, and (where feasible) outcomes; drift detection; subgroup performance dashboards; alert thresholds tied to PCCP triggers; rollback capability.

Documentation: Postmarket monitoring plan, PCCP, drift-incident response procedure, periodic monitoring reports.

How GMLP maps into your submission

The 10 principles do not have a dedicated section in a 510(k) - they show up as evidence inside the standard sections: design controls, software documentation, risk management file, cybersecurity documentation, validation report, labeling, and PCCP. A practical move is to maintain a GMLP traceability matrix internally that points each principle to the documents and controls that satisfy it, so when a reviewer asks, the answer is one row in a table.

Where to go next

Sources

FDA / Health Canada / MHRA, Good Machine Learning Practice for Medical Device Development: Guiding Principles (October 2021)
FDA, Artificial Intelligence-Enabled Device Software Functions: Lifecycle Management and Marketing Submission Recommendations (Draft, January 2025)

GMLP Crosswalk: 10 Principles to Engineering Controls