AI SaMD Playbook
    Talk to us →
    ← Back to home
    Starter guide · v1

    Model cards that hold up in a regulatory review.

    A working template, twelve-section checklist, and worked examples for assembling a model card that answers what FDA, MHRA, Health Canada, and EU notified bodies actually ask. Mapped to the guiding documents already cited on this site.

    Jump to sections ↓
    Built on
    • FDA Transparency Guiding Principles
    • FDA / Health Canada / MHRA, Good Machine Learning Practice
    • FDA PCCP final guidance · Aug 2025
    • EU AI Act Art. 13 & 113 · MDCG 2025-6
    Six anchor sections

    The structure reviewers expect to find.

    Each section pairs the regulator's question with a working example of the answer. Use the downloadable checklist for the full twelve-section version.

    §01

    Intended use & indications

    Anchor every downstream claim. Reviewers compare every metric and limitation against the words you put here.

    Source · FDA, Transparency for ML-enabled devices ↗
    Required fields
    • Clinical task in plain language
    • Target patient population (age, sex, comorbidities)
    • Care setting and user profile
    • Explicit out-of-scope uses & contraindications
    Worked example
    Task
    Triage of non-contrast head CT for suspected intracranial haemorrhage
    Population
    Adults ≥18 presenting to ED with suspected acute stroke
    User
    Board-certified radiologist; not for use without expert review
    Out of scope
    Paediatric, post-operative, or contrast-enhanced studies
    §02

    Training data provenance

    Reviewers want to know who is represented, who isn't, and why. Gaps disclosed up front are mitigations; gaps discovered later are findings.

    Source · FDA, GMLP guiding principles ↗
    Required fields
    • Sources, institutions, geographies, years
    • Sample size + class balance
    • Demographic + device distribution
    • Inclusion / exclusion criteria
    • Labelling protocol + inter-rater agreement
    • Known representational gaps
    Worked example
    Sources
    4 US academic centres + 1 EU teaching hospital, 2018–2023
    N
    12,840 studies; 18.4% positive for ICH
    Scanners
    GE, Siemens, Canon, 64-slice and above
    Gap
    Under-representation of patients <30 and non-contrast scanners <16-slice
    §03

    Performance, overall and by subgroup

    Headline metrics are not enough. Stratified results are now an explicit expectation in FDA, MHRA and Health Canada review.

    Source · FDA, GMLP guiding principles ↗
    Required fields
    • Independent test set definition (site / patient / time split)
    • Primary metrics with 95% CIs
    • Subgroup table: sex, age band, race/ethnicity, device, site
    • Calibration plot or ECE
    • External validation cohort
    Worked example
    Sensitivity
    94.1% (92.7–95.3)
    Specificity
    89.6% (88.4–90.7)
    Sens. (female ≥65)
    91.0% (87.8–93.6), flagged for monitoring
    External cohort
    n=2,104, EU site, sens. 92.4% / spec. 88.1%
    §04

    Limitations & failure modes

    Documenting where the model breaks is a transparency obligation under both the FDA guiding principles and EU AI Act Article 13.

    Source · EU AI Act, consolidated text (Article 13) ↗
    Required fields
    • Documented failure modes
    • Populations / settings where performance is degraded
    • Open bugs and CAPAs disclosed to deployers
    • Behaviour on out-of-distribution inputs
    Worked example
    Failure mode
    Motion-degraded studies → confidence < 0.6 returned with abstention
    Degraded setting
    Non-contrast scanners <16-slice not validated; device blocks inference
    OOD
    Paediatric input → input-validation error, no score returned
    §05

    Cybersecurity posture

    AI inherits every classical software threat and adds adversarial inputs, prompt injection, and model extraction. Both FDA and EU MDR Annex I §17 expect explicit handling.

    Source · FDA, Premarket cybersecurity guidance (2023) ↗
    Required fields
    • SBOM including model weights + inference runtime
    • Threat model + risk register reference
    • Adversarial-robustness testing for the modality in scope
    • Coordinated vulnerability disclosure contact
    Worked example
    SBOM
    SPDX 2.3 generated per release; weights pinned by SHA-256
    Adversarial
    FGSM + PGD evaluated; degradation < 3% at ε=2/255
    CVD
    security@vendor.example, 90-day disclosure window
    §06

    Lifecycle & change control (PCCP)

    FDA's August 2025 final PCCP guidance defines what can change without a new submission. Anything not in the PCCP triggers a new authorisation.

    Source · FDA, PCCP final guidance (Aug 2025) ↗
    Required fields
    • PCCP on file? Reference document ID
    • Modification protocol: what can change, how, with what limits
    • Performance monitoring metrics + alert thresholds
    • Drift detection method
    • Rollback path
    Worked example
    PCCP scope
    Re-training on +20% data per quarter; threshold tuning ±0.05
    Monitoring
    Weekly sensitivity by site; alert at −2σ from baseline
    Rollback
    Previous model retained for 24 months; rollback ≤ 4h
    Take it with you

    The full twelve-section checklist, as Markdown.

    Drop it in your DMS, paste it into your QMS, or hand it to the model team. Adapted from the same primary sources cited across the rest of this site.