AI SaMD Playbook
    Talk to us →
    Back to playbook
    § Tool · GMLP self-assessment

    Score your program against the 10 GMLP principles.

    The 10 Good Machine Learning Practice principles jointly published by FDA, Health Canada, and MHRA in October 2021. Score each principle, get a heatmap, weighted maturity band, and a remediation list for every gap.

    Principle 01

    Multi-disciplinary expertise leveraged throughout the total product life cycle

    None

    Clinical, engineering, ML, RA/QA, HF, and security roles named with documented decision rights across discovery, validation, release, and post-market.

    Named owners per lifecycle phaseCross-functional design reviewsEscalation path documented
    Principle 02

    Good software engineering and security practices implemented

    None

    IEC 62304-aligned SDLC plus FDA Feb 2026 cybersecurity expectations: SBOM/AI-BOM, threat model, secure build, signed releases, vuln management.

    62304 planSBOM + AI-BOMThreat modelSigned artifactsVulnerability handling SOP
    Principle 03

    Clinical study participants and data sets represent the intended patient population

    None

    Subgroup coverage (age, sex, race, comorbidity, device, site) documented with quantitative gaps and a mitigation plan for under-represented strata.

    Subgroup tableSite/device diversityDocumented gaps + mitigations
    Principle 04

    Training data sets are independent of test sets

    None

    No patient-, site-, or temporal leakage between training, tuning, and test partitions; splits are reproducible and version-locked.

    Split policy SOPPatient/site/time leakage checkFrozen test set hash
    Principle 05

    Selected reference datasets are based upon best available methods

    None

    Reference standard / ground truth method is defensible, with adjudication, inter-rater agreement, and limitations stated.

    Reference standard SOPAdjudication recordInter-rater agreement
    Principle 06

    Model design is tailored to the available data and reflects intended use

    None

    Architecture choice, input modalities, and output thresholds are justified against the intended-use statement and operating context.

    Design rationaleThreshold justificationIntended-use binding
    Principle 07

    Focus is placed on the performance of the human-AI team

    None

    Performance is measured with the clinician in the loop where applicable; automation bias and override patterns are characterized.

    Reader study or HF evalOverride / automation-bias dataLabeling for clinician role
    Principle 08

    Testing demonstrates device performance during clinically relevant conditions

    None

    Validation covers realistic site mix, scanner/device mix, image quality, and edge cases; out-of-scope inputs are characterized.

    Multi-site validationEdge-case batteryOOD characterization
    Principle 09

    Users are provided clear, essential information

    None

    Labeling discloses intended use, performance by subgroup, known limitations, OOD behavior, and any post-market changes (PCCP).

    Model card publishedSubgroup performance in labelingPCCP summary in labeling
    Principle 10

    Deployed models are monitored for performance and re-training risks managed

    None

    Drift, calibration, and subgroup metrics monitored on a schedule with quantitative thresholds and a documented response (PCCP, FSN, disablement).

    PMS plan with drift metricsQuantitative thresholdsDocumented response path