Joint Commission and CHAI just set the tone for AI governance — and most hospitals can’t answer basic questions about how their models work.

Surveyors won’t start by asking “Which AI do you use?” They’ll ask, “Show me how you govern it, monitor drift, and decide who’s accountable when it fails.”
Most hospitals can’t answer those questions.
I watched a CM...
Joint Commission and CHAI just set the tone for AI governance — and most hospitals can’t answer basic questions about how their models work.

Surveyors won’t start by asking “Which AI do you use?” They’ll ask, “Show me how you govern it, monitor drift, and decide who’s accountable when it fails.”
Most hospitals can’t answer those questions.
I watched a CMIO try to explain their sepsis prediction model to a medical staff committee last month. The questions were basic:
- What data was it trained on?
- How often do you retrain it?
- What’s the false negative rate for patients over 75?
- Who reviews the predictions?
- What happens when it’s wrong?
He couldn’t answer any of them. The vendor documentation said “clinically validated” and “FDA cleared” but contained zero details about training data, performance by subgroup, or operational governance.
The committee voted to suspend use of the model until proper documentation could be provided.
This is happening everywhere. Joint Commission is developing AI-specific standards. Coalition for Health AI (CHAI) released governance frameworks. State medical boards are starting to ask questions about AI in clinical workflows.
By 2026, “we have AI” won’t be enough. You’ll need to prove you know how it works, how you monitor it, and who’s accountable when it fails.
The documentation you need is called a model card. And almost no one has one.
What Joint Commission and CHAI Are Actually Requiring
The regulatory landscape for healthcare AI is shifting from “innovation-friendly ambiguity” to “show us your governance.”
Joint Commission’s Direction
Joint Commission hasn’t released final AI standards yet, but their pilot site visits are testing questions:
Governance structure:
- Who’s accountable for AI system performance?
- How are AI decisions integrated into clinical workflows?
- What’s the escalation path when AI recommendations are questioned?
Safety monitoring:
- How do you track AI system performance over time?
- What triggers a review or retrain?
- How are adverse events related to AI captured and analyzed?
Clinical validation:
- What evidence supports the AI’s clinical utility?
- How was it validated on your specific patient population?
- Are performance metrics stratified by patient subgroups?
Staff competency:
- How are clinicians trained on AI system use?
- How do you assess whether staff understand AI limitations?
- What’s the process for reporting AI-related concerns?
CHAI’s Framework
Coalition for Health AI published their governance framework in 2024. Key requirements:
Transparency:
- Document what the AI does and how it works
- Explain intended use and limitations
- Disclose training data characteristics
Accountability:
- Assign clear ownership for AI system oversight
- Define decision-making authority
- Establish incident response procedures
Equity:
- Monitor performance across patient subgroups
- Identify and mitigate bias
- Ensure fair access to AI-enabled care
Performance monitoring:
- Track accuracy metrics continuously
- Detect and respond to model drift
- Report performance to stakeholders
What State Medical Boards Are Asking
Some states are ahead of federal regulation:
California Medical Board: Requires physicians using AI in clinical decisions to document:
- What AI system was used
- How the recommendation was interpreted
- Clinical reasoning for final decision
New York State Department of Health: AI systems must:
- Be validated on representative patient populations
- Have documented governance processes
- Include human oversight mechanisms
The Model Card: What It Is and Why You Need One
A model card is structured documentation that describes:
- What the model does
- How it was built
- How well it works
- What its limitations are
- How it should (and shouldn’t) be used
Think of it as a nutrition label for AI models.
The Essential Components
Based on CHAI guidelines and Joint Commission pilot questions, here’s what your model card needs:
Section 1: Model Overview
What to document:
- Model name and version
- Intended use (clinical purpose, target population, care setting)
- Developer (vendor or internal team)
- Deployment date and current version
- Regulatory status (FDA cleared, clinical decision support, wellness tool)
Example:
Model: SepsisGuard v2.3 Purpose: Early prediction of sepsis risk in hospitalized adult patients Developer: HealthTech AI (deployed under internal clinical validation) Deployment: March 2024 (current version), initially deployed June 2023 Regulatory: Clinical decision support tool (non-diagnostic) Setting: Inpatient wards and ICU, adult patients only
Section 2: Training Data
What to document:
- Data sources (which EHRs, which institutions, date range)
- Population characteristics (demographics, conditions, acuity)
- Sample size and class balance
- Inclusion/exclusion criteria
- Known limitations or biases in training data
Example:
Training Data: Epic EHR data from 3 academic medical centers Date range: January 2021 - December 2023 Sample size: 47,000 patient admissions - Sepsis cases: 3,200 (6.8%) - Non-sepsis: 43,800 (93.2%) Demographics: - Age: Mean 58 (range 18-95) - Sex: 52% female, 48% male - Race/ethnicity: 62% White, 18% Black, 12% Hispanic, 8% Other - Payer: 45% Medicare, 35% Commercial, 15% Medicaid, 5% Other Exclusions: Pediatric patients (<18), obstetric admissions, patients with DNR orders at admission, admissions <24 hours Known limitations: Under-represents rural patient populations, limited data from community hospitals, may not generalize to post-acute settings
Section 3: Model Architecture and Features
What to document:
- Model type (logistic regression, random forest, neural network, LLM)
- Input features (what data the model uses)
- Output format (probability score, classification, recommendation)
- How predictions are generated
Example:
Model type: Gradient boosted decision tree (XGBoost) Input features (62 total): - Vital signs: HR, RR, BP, Temp, SpO2 (most recent + trends) - Labs: WBC, lactate, creatinine, bilirubin, platelets - Clinical: Age, comorbidities (Elixhauser), infection flags - Medications: Antibiotics, vasopressors (current + 24hr history) - Encounter: Admission source, length of stay Output: Probability score (0-100%) indicating sepsis risk within 6 hours - Low risk: <20% - Moderate risk: 20-50% - High risk: >50% Prediction frequency: Updates every 4 hours with new data
Section 4: Performance Metrics
What to document:
- Primary metrics (accuracy, AUC, sensitivity, specificity)
- Performance on validation set
- Performance by patient subgroup
- Comparison to baseline or alternative approaches
- Confidence intervals
Example:
Validation Performance (hold-out test set, n=10,000): - AUC: 0.87 (95% CI: 0.85-0.89) - Sensitivity: 85% (at 50% threshold) - Specificity: 82% - PPV: 24% (reflects 6.8% prevalence) - NPV: 99% Performance by subgroup: Age <65: AUC 0.85 Age ≥65: AUC 0.88 Male: AUC 0.86 Female: AUC 0.87 White: AUC 0.88 Black: AUC 0.84 Hispanic: AUC 0.86 Medicare: AUC 0.87 Medicaid: AUC 0.83 Commercial: AUC 0.86 Comparison: Historical clinician detection sensitivity ~65% (based on chart review audit) Known performance gaps: Lower accuracy for Black and Medicaid patients, under investigation
Section 5: Intended Use and Limitations
What to document:
- Appropriate use cases
- Clinical scenarios where model should NOT be used
- Required human oversight
- Known failure modes
Example:
Intended use: - Early warning system for sepsis risk in hospitalized adults - Triggers nurse assessment and physician notification - Supports clinical decision-making (does not replace judgment) Contraindications: - DO NOT use for pediatric patients (not validated) - DO NOT use for obstetric patients (excluded from training) - DO NOT use as sole basis for treatment decisions - DO NOT use for patients with DNR orders (palliative context differs) Required oversight: - All high-risk alerts (>50%) reviewed by RN within 1 hour - Physician assessment required before sepsis protocol initiation - Clinical judgment overrides model in all cases Known limitations: - May under-predict in immunocompromised patients - Performance degrades with missing lab values - Not validated for post-surgical patients <24 hours post-op - May generate false positives in patients with chronic inflammatory conditions
Section 6: Deployment and Monitoring
What to document:
- How the model is integrated into clinical workflow
- Who sees predictions and how they’re displayed
- What actions are triggered by predictions
- How performance is monitored
- Retraining schedule
Example:
Deployment: - Model runs every 4 hours on all inpatient admissions - Predictions displayed in EHR (Epic) nursing flowsheet - High-risk alerts sent to charge nurse via Vocera - Documented in patient chart (discrete field) Workflow integration: - Low risk (<20%): No action required - Moderate risk (20-50%): Nurse assessment, document findings - High risk (>50%): Nurse assessment + physician notification + consider sepsis protocol Monitoring: - Performance metrics reviewed monthly (sensitivity, specificity, AUC) - Stratified by age, sex, race, payer - Input data distribution monitored for drift - Clinical feedback collected via EHR feedback button Retraining: - Scheduled: Quarterly - Triggered: If sensitivity drops below 80% or significant drift detected - Last retrain: September 2024 - Next scheduled: December 2024
Section 7: Governance and Accountability
What to document:
- Who owns the model
- Who’s responsible for monitoring
- Escalation procedures
- Incident response plan
Example:
Ownership: - Executive sponsor: CMIO - Clinical lead: Dr. Sarah Chen, Critical Care - Technical lead: John Martinez, Senior Data Scientist - Operations: Sepsis Steering Committee Monitoring responsibilities: - Daily: Automated performance dashboard (Data Science team) - Weekly: Alert volume and override rate review (Nursing leadership) - Monthly: Performance metrics review (Sepsis Steering Committee) - Quarterly: External audit (Clinical Informatics + Quality) Escalation: - Performance degradation: Data Science → Clinical lead → CMIO - Safety concerns: Any clinician → Patient Safety → CMIO - Technical issues: IT Help Desk → Data Science → IT Leadership Incident response: - Suspected patient harm: Immediate model suspension, patient safety investigation - Performance drift: Expedited retrain or model rollback - Technical failure: Failover to manual process, root cause analysis
What Hospitals Actually Have vs. What They Need
I audited documentation for 12 healthcare AI systems across 4 health systems. Here’s the gap:
What They Have
Vendor-provided documentation:
- Marketing materials (“AI-powered clinical decision support”)
- FDA 510(k) clearance letter (if applicable)
- Basic user manual (“how to interpret scores”)
- IT integration guide (“API endpoints and data formats”)
Internal documentation:
- Purchase order and contract
- IT security review
- Go-live announcement email
- Maybe a few PowerPoint slides from implementation
What They Don’t Have
Training data details:
- 2 out of 12 could describe training data characteristics
- 0 out of 12 had demographic breakdown of training data
- 0 out of 12 knew if training data matched their patient population
Performance metrics by subgroup:
- 1 out of 12 had stratified performance data
- 11 out of 12 only had overall accuracy numbers
- 0 out of 12 monitored subgroup performance post-deployment
Operational governance:
- 3 out of 12 had assigned accountability for model oversight
- 1 out of 12 had documented monitoring procedures
- 0 out of 12 had retraining schedules
Clinical integration details:
- 8 out of 12 couldn’t explain how predictions affected clinical workflow
- 5 out of 12 couldn’t identify who was responsible for acting on predictions
- 4 out of 12 didn’t know if clinicians could override the AI
The gap: Organizations have enough documentation to deploy the AI, but not enough to govern it responsibly.
The Incidents That Prove You Need This
Incident 1: The Model Nobody Could Explain
Hospital medical staff committee reviewing AI sepsis model after physician complaint that it was “always wrong.”
Committee questions:
- What’s the false negative rate?
- Does it work equally well for all patients?
- How was it validated?
CMIO response: “I’ll get back to you.”
What happened:
- CMIO contacted vendor, vendor provided marketing materials
- No performance data by subgroup available
- No documentation of validation on hospital’s specific population
- Committee voted to suspend use until proper validation completed
Timeline: 3 months to gather documentation, re-validate on internal data, and resume use
Cost: Lost productivity, staff frustration, damaged credibility of AI program
Incident 2: The Bias Discovered Too Late
Health system deployed readmission prediction model to prioritize care management outreach.
What happened:
- Model used for 18 months before anyone checked subgroup performance
- External audit revealed systematic under-prediction for Black patients
- Model routed fewer Black patients to high-risk care management
- Potential disparate impact on health outcomes
Questions from legal and compliance:
- When was the model last audited for bias?
- Who was monitoring subgroup performance?
- What’s the process for detecting issues like this?
Answer: “We didn’t know we needed to monitor that.”
Impact: Model suspended, full audit of all AI systems required, OCR complaint filed by community advocacy group
Incident 3: The Model That Drifted Unnoticed
Diagnostic imaging AI for pneumonia detection, deployed 2 years prior.
What happened:
- Radiologists increasingly overriding AI recommendations
- Nobody tracking override rate systematically
- Performance analysis revealed sensitivity dropped from 92% to 78%
Root cause investigation questions:
- When was the model last validated?
- Who’s responsible for monitoring performance?
- What’s the retraining schedule?
Answer: “We thought the vendor was monitoring it.”
Reality: Vendor provided the model, hospital responsible for monitoring. No one was watching.
How to Build Your Model Card This Week
You don’t need to document everything perfectly. Start with what you can answer today and identify gaps.
Day 1: Model Inventory
List every AI/ML model in clinical use:
- Model name
- What it predicts/recommends
- Where it’s deployed (which units, which workflows)
- Who uses it (clinicians, staff, automated systems)
Day 2: Gather Existing Documentation
For each model, collect:
- Vendor documentation (user guides, technical specs, FDA clearance)
- Internal documentation (implementation plans, training materials)
- Contract and procurement records
- Any validation studies or performance reports
Day 3: Identify What You Know
For each model, answer what you can:
- What data was it trained on? (even if just “vendor didn’t tell us”)
- What’s the accuracy? (overall, even if not stratified)
- How is it used clinically? (who sees it, what do they do with it)
- Who’s responsible for it? (even if not formally assigned)
Day 4: Identify What You Don’t Know
For each model, list what you can’t answer:
- Training data characteristics?
- Performance by patient subgroup?
- Monitoring procedures?
- Retraining schedule?
- Accountability structure?
Day 5: Prioritize Gaps by Risk
Rank models by:
- Clinical risk (high-stakes decisions vs. low-risk support)
- Usage volume (affects many patients vs. niche use)
- Regulatory exposure (diagnostic vs. wellness)
Focus documentation efforts on highest-risk models first.
The Template You Can Use Today
Here’s a minimal viable model card template:
MODEL CARD: [Model Name] BASIC INFORMATION Model: [Name and version] Purpose: [What clinical problem does this address?] Developer: [Vendor or internal team] Deployment date: [When did we start using it?] Current status: [Active, suspended, pilot] TRAINING AND VALIDATION Training data: [What we know about training data - even if limited] Validation: [Was it validated? On what population?] Known limitations: [What doesn't it work for?] PERFORMANCE Overall accuracy: [Best metrics available] Subgroup performance: [If available - if not, note "not yet measured"] Comparison: [How does it compare to current practice?] CLINICAL USE Intended use: [What should it be used for?] Contraindications: [What should it NOT be used for?] Workflow: [How is it integrated? Who sees predictions? What actions result?] Oversight: [What human review is required?] MONITORING AND GOVERNANCE Performance monitoring: [How often? Who reviews?] Retraining: [Schedule or triggers?] Accountability: [Who owns this model?] Incident response: [What happens if something goes wrong?] GAPS IN DOCUMENTATION [List what we don't know but need to find out] LAST UPDATED: [Date] NEXT REVIEW: [Date]
This isn’t perfect, but it’s infinitely better than “we have AI and we think it works.”
What Surveyors Will Actually Ask
Based on Joint Commission pilot visits and CHAI assessments:
Question 1: “Show me your AI inventory.”
What they want: List of all AI systems in clinical use with basic details (purpose, location, status)
What they’ll flag: “We’re not sure what counts as AI” or “We don’t have a comprehensive list”
Question 2: “Pick one. How do you know it works?”
What they want: Evidence of validation, performance monitoring, accountability
What they’ll flag: “The vendor said it’s validated” with no internal verification
Question 3: “How do you monitor for bias or disparities?”
What they want: Process for checking performance across patient subgroups
What they’ll flag: “We track overall accuracy” with no subgroup analysis
Question 4: “What happens when it’s wrong?”
What they want: Incident response plan, accountability structure, process improvement
What they’ll flag: “That hasn’t happened” (it has, you just don’t know about it)
Question 5: “How do clinicians know when to trust it?”
What they want: Training program, competency assessment, clear guidance on appropriate use
What they’ll flag: “We sent an email when we launched it” with no ongoing training
The Question Your CMIO Should Be Asking
If you’re a hospital executive or board member, ask your CMIO:
“For our three highest-risk AI systems, can you show me the model card?”
Good answer: Pulls up structured documentation covering training data, performance metrics, monitoring procedures, and governance.
Bad answer: “What’s a model card?” or “We have the vendor documentation” or “I’ll have to get back to you.”
If they can’t produce a model card, you have undocumented AI making clinical decisions in your hospital.
That’s not an innovation problem. It’s a governance problem.
And in 2026, it’s going to be a regulatory problem.
What to Do Right Now
This week:
Create model cards for your three highest-risk AI systems. Even incomplete cards are better than nothing.
This month:
Assign accountability. Every AI system needs an owner who’s responsible for monitoring and governance.
This quarter:
Build the infrastructure. You need processes for performance monitoring, drift detection, incident response, and retraining.
By 2026:
You need to be able to answer every question on this list for every AI system in clinical use. Because surveyors will be asking.
Building the governance infrastructure AI needs before regulators require it. Every Tuesday and Thursday in Builder’s Notes.
Running AI in clinical workflows? Drop a comment with your biggest documentation gap — I’ll tell you what surveyors will flag.
Piyoosh Rai is Founder & CEO of The Algorithm, building native-AI platforms for healthcare, financial services, and government sectors. After watching a CMIO unable to answer basic questions about their AI systems in a medical staff meeting, he writes about the governance documentation that separates responsible AI deployment from regulatory nightmares.
Further Reading
For monitoring AI model performance over time:
The Builder’s Notes: Your “Cutting-Edge” Healthcare AI Is Already Out of Date
For operational risks in AI deployment:
For AI security and incident response:
The Builder’s Notes: The AI Documentation Binder Your Hospital Will Be Begging for in 2026 was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.