Starter guide · v1

Model cards that hold up in a regulatory review.

A working template, twelve-section checklist, and worked examples for assembling a model card that answers what FDA, MHRA, Health Canada, and EU notified bodies actually ask. Mapped to the guiding documents already cited on this site.

Jump to sections ↓

Built on

FDA Transparency Guiding Principles
FDA / Health Canada / MHRA, Good Machine Learning Practice
FDA PCCP final guidance · Aug 2025
EU AI Act Art. 13 & 113 · MDCG 2025-6

Six anchor sections

The structure reviewers expect to find.

Each section pairs the regulator's question with a working example of the answer. Use the downloadable checklist for the full twelve-section version.

§01

Intended use & indications

Anchor every downstream claim. Reviewers compare every metric and limitation against the words you put here.

Source · FDA, Transparency for ML-enabled devices ↗

Required fields

Clinical task in plain language
Target patient population (age, sex, comorbidities)
Care setting and user profile
Explicit out-of-scope uses & contraindications

Worked example

Task: Triage of non-contrast head CT for suspected intracranial haemorrhage
Population: Adults ≥18 presenting to ED with suspected acute stroke
User: Board-certified radiologist; not for use without expert review
Out of scope: Paediatric, post-operative, or contrast-enhanced studies

§02

Training data provenance

Reviewers want to know who is represented, who isn't, and why. Gaps disclosed up front are mitigations; gaps discovered later are findings.

Source · FDA, GMLP guiding principles ↗

Required fields

Sources, institutions, geographies, years
Sample size + class balance
Demographic + device distribution
Inclusion / exclusion criteria
Labelling protocol + inter-rater agreement
Known representational gaps

Worked example

Sources: 4 US academic centres + 1 EU teaching hospital, 2018–2023
N: 12,840 studies; 18.4% positive for ICH
Scanners: GE, Siemens, Canon, 64-slice and above
Gap: Under-representation of patients <30 and non-contrast scanners <16-slice

§03

Performance, overall and by subgroup

Headline metrics are not enough. Stratified results are now an explicit expectation in FDA, MHRA and Health Canada review.

Source · FDA, GMLP guiding principles ↗

Required fields

Independent test set definition (site / patient / time split)
Primary metrics with 95% CIs
Subgroup table: sex, age band, race/ethnicity, device, site
Calibration plot or ECE
External validation cohort

Worked example

Sensitivity: 94.1% (92.7–95.3)
Specificity: 89.6% (88.4–90.7)
Sens. (female ≥65): 91.0% (87.8–93.6), flagged for monitoring
External cohort: n=2,104, EU site, sens. 92.4% / spec. 88.1%

§04

Limitations & failure modes

Documenting where the model breaks is a transparency obligation under both the FDA guiding principles and EU AI Act Article 13.

Source · EU AI Act, consolidated text (Article 13) ↗

Required fields

Documented failure modes
Populations / settings where performance is degraded
Open bugs and CAPAs disclosed to deployers
Behaviour on out-of-distribution inputs

Worked example

Failure mode: Motion-degraded studies → confidence < 0.6 returned with abstention
Degraded setting: Non-contrast scanners <16-slice not validated; device blocks inference
OOD: Paediatric input → input-validation error, no score returned

§05

Cybersecurity posture

AI inherits every classical software threat and adds adversarial inputs, prompt injection, and model extraction. Both FDA and EU MDR Annex I §17 expect explicit handling.

Source · FDA, Premarket cybersecurity guidance (2023) ↗

Required fields

SBOM including model weights + inference runtime
Threat model + risk register reference
Adversarial-robustness testing for the modality in scope
Coordinated vulnerability disclosure contact

Worked example

SBOM: SPDX 2.3 generated per release; weights pinned by SHA-256
Adversarial: FGSM + PGD evaluated; degradation < 3% at ε=2/255
CVD: security@vendor.example, 90-day disclosure window

§06

Lifecycle & change control (PCCP)

FDA's August 2025 final PCCP guidance defines what can change without a new submission. Anything not in the PCCP triggers a new authorisation.

Source · FDA, PCCP final guidance (Aug 2025) ↗

Required fields

PCCP on file? Reference document ID
Modification protocol: what can change, how, with what limits
Performance monitoring metrics + alert thresholds
Drift detection method
Rollback path

Worked example

PCCP scope: Re-training on +20% data per quarter; threshold tuning ±0.05
Monitoring: Weekly sensitivity by site; alert at −2σ from baseline
Rollback: Previous model retained for 24 months; rollback ≤ 4h

Take it with you

The full twelve-section checklist, as Markdown.

Drop it in your DMS, paste it into your QMS, or hand it to the model team. Adapted from the same primary sources cited across the rest of this site.