§ Tool · GMLP self-assessment

Score your program against the 10 GMLP principles.

The 10 Good Machine Learning Practice principles jointly published by FDA, Health Canada, and MHRA in October 2021. Score each principle, get a heatmap, weighted maturity band, and a remediation list for every gap.

Principle 01

Multi-disciplinary expertise leveraged throughout the total product life cycle

None

Clinical, engineering, ML, RA/QA, HF, and security roles named with documented decision rights across discovery, validation, release, and post-market.

Named owners per lifecycle phaseCross-functional design reviewsEscalation path documented

Principle 02

Good software engineering and security practices implemented

None

IEC 62304-aligned SDLC plus FDA Feb 2026 cybersecurity expectations: SBOM/AI-BOM, threat model, secure build, signed releases, vuln management.

62304 planSBOM + AI-BOMThreat modelSigned artifactsVulnerability handling SOP

Principle 03

Clinical study participants and data sets represent the intended patient population

None

Subgroup coverage (age, sex, race, comorbidity, device, site) documented with quantitative gaps and a mitigation plan for under-represented strata.

Subgroup tableSite/device diversityDocumented gaps + mitigations

Principle 04

Training data sets are independent of test sets

None

No patient-, site-, or temporal leakage between training, tuning, and test partitions; splits are reproducible and version-locked.

Split policy SOPPatient/site/time leakage checkFrozen test set hash

Principle 05

Selected reference datasets are based upon best available methods

None

Reference standard / ground truth method is defensible, with adjudication, inter-rater agreement, and limitations stated.

Reference standard SOPAdjudication recordInter-rater agreement

Principle 06

Model design is tailored to the available data and reflects intended use

None

Architecture choice, input modalities, and output thresholds are justified against the intended-use statement and operating context.

Design rationaleThreshold justificationIntended-use binding

Principle 07

Focus is placed on the performance of the human-AI team

None

Performance is measured with the clinician in the loop where applicable; automation bias and override patterns are characterized.

Reader study or HF evalOverride / automation-bias dataLabeling for clinician role

Principle 08

Testing demonstrates device performance during clinically relevant conditions

None

Validation covers realistic site mix, scanner/device mix, image quality, and edge cases; out-of-scope inputs are characterized.

Multi-site validationEdge-case batteryOOD characterization

Principle 09

Users are provided clear, essential information

None

Labeling discloses intended use, performance by subgroup, known limitations, OOD behavior, and any post-market changes (PCCP).

Model card publishedSubgroup performance in labelingPCCP summary in labeling

Principle 10

Deployed models are monitored for performance and re-training risks managed

None

Drift, calibration, and subgroup metrics monitored on a schedule with quantitative thresholds and a documented response (PCCP, FSN, disablement).

PMS plan with drift metricsQuantitative thresholdsDocumented response path

Maturity score

0 / 30 · Initial

Remediation list

P01 Define a RACI across ML, clinical, RA/QA, security, and human factors and bind it to the QMS lifecycle procedure.
P02 Add an AI-BOM beside the SBOM and tie the threat model to the FDA Feb 2026 cyber guidance section structure.
P03 Publish a subgroup coverage matrix and an under-representation remediation plan inside the validation report.
P04 Add a leakage audit (patient ID, site ID, timestamp) and freeze the test partition with a recorded hash.
P05 Document the reference-standard rationale, adjudication rules, and inter-rater agreement statistics.
P06 Write a design-rationale memo linking model choice and thresholds to the intended-use statement.
P07 Add a reader study or human-factors evaluation measuring the human-AI team, not just the model in isolation.
P08 Expand validation to additional sites/devices and add an explicit out-of-distribution / edge-case test battery.
P09 Promote the internal model card into user-facing labeling with subgroup performance and PCCP summary.
P10 Add quantitative drift thresholds to the PMS plan and bind each threshold to a PCCP / field-safety response.

Informational only · not a regulatory determination. Scoring rubric derived from the FDA / Health Canada / MHRA GMLP guiding principles (Oct 2021).