Abstract
Background:
Artificial intelligence (AI) platforms can potentially enhance clinical decision-making (CDM) in primary care settings. OpenEvidence (OE), an AI tool, draws from trusted sources to generate evidence-based medicine (EBM) recommendations to address clinical questions. However, its effectiveness in real-world primary care cases remains unknown.
Objective:
To evaluate the performance of OE in providing EBM recommendations for five common chronic conditions in primary care: hypertension, hyperlipidemia, diabetes mellitus type 2, depression, and obesity.
Methods:
Five patient cases were retrospectively analyzed. Physicians posed specific clinical questions, and OE responses were evaluated on clarity, relevance, evidence support, impact on CD…
Abstract
Background:
Artificial intelligence (AI) platforms can potentially enhance clinical decision-making (CDM) in primary care settings. OpenEvidence (OE), an AI tool, draws from trusted sources to generate evidence-based medicine (EBM) recommendations to address clinical questions. However, its effectiveness in real-world primary care cases remains unknown.
Objective:
To evaluate the performance of OE in providing EBM recommendations for five common chronic conditions in primary care: hypertension, hyperlipidemia, diabetes mellitus type 2, depression, and obesity.
Methods:
Five patient cases were retrospectively analyzed. Physicians posed specific clinical questions, and OE responses were evaluated on clarity, relevance, evidence support, impact on CDM, and overall satisfaction. Four independent physicians provided ratings using a 0 to 4 scale.
Results:
OE provided accurate, evidence-based recommendations in all cases, aligning with physician plans. OE was scored on a scale of zero to four, where zero was very unclear, and four was very clear. Mean scores across cases were clarity (3.55 ± 0.60), relevance (3.75 ± 0.44), support (3.35 ± 0.49), and satisfaction (3.60 ± 0.60). However, the impact on CDM was limited (1.95 ± 1.05), as OE primarily reinforced rather than modified plans.
Conclusion:
OE was rated high in clarity, relevance, and evidence-based support, reinforcing physician decisions in common chronic conditions. While the impact on CDM was minimal due to the study’s retrospective nature, OE shows promise in augmenting the primary care physician. Prospective trials are needed to evaluate its utility in complex cases and multidisciplinary settings.
Keywords: artificial intelligence (AI), clinical decision-making, primary care, OpenEvidence, chronic disease management
Introduction
With the widespread implementation of electronic health record (EHR) systems, the potential for generating large amounts of data has set the stage for using large language models (LLMs) and developing artificial intelligence (AI) tools to improve patient care. 1 A recent scoping review found that AI utilization in primary care clinics (PCCs) is in the early stages of development, with most studies focusing on AI methods to support physician clinical decision-making (CDM) with diagnostic and treatment recommendations, particularly with management of chronic conditions.1,2 In addition, AI has assisted with administrative tasks such as burdensome EHR clinical documentation, including converting clinicians’ voices in the exam room to clinical notes in the EHR. 2 Despite these early successful AI PCC applications, significant clinical challenges persist. One concern of general practitioners is the reliability of AI in CDM, which could lead to misdiagnosis or inappropriate treatment. 3 AI systems can perpetuate or even exacerbate existing biases in healthcare. 4 This is particularly concerning in primary care, where diverse patient populations are served. The potential for algorithmic bias and inequity in AI applications is a significant barrier. 4 Another challenge of applying AI in CDM, is the high degree of complexity and chronic conditions often seen in PCCs. AI has been less successful when applied to complex single disease states or situations with multiple complex conditions, furthering magnifying health inequities between clinicians in specialties compared to primary care. 2
The role of AI in augmenting physician CDM in the management of chronic conditions for individual patients in the PCC is still largely unknown. 5 LLM-based platforms have the potential to answer specific questions that can help individualize care. 6 Several clinical studies have evaluated ChatGPT (OpenAI, San Francisco, CA, USA) and its role in CDM.5,7 A recent trial examined the accuracy of ChatGPT in developing differential diagnoses (DD) and final diagnoses (FD) for 36 published vignettes. 7 ChatGPT’s accuracy of the final DD accuracy (60.3%: 95% CI 54.2%-66.6%) was lower than the FD rate (76.9%: 95% CI 67.8%-86.1%). 6 The authors cited ChatGPT’s strong performance as encouraging, with the main limitations being possible model hallucinations and unclear composition of the training dataset. 6 More data is needed regarding how these LLM tools perform on CDM in actual PCC patients with complex chronic conditions that physicians often face.
Five of the more common chronic conditions treated in a PCC are hypertension (HTN), hyperlipidemia (HLD), osteoarthritis (OA), diabetes mellitus type 2 (DM2), and depression. 8 One common condition that is often overlooked in the diagnosis and treatment plan in PCCs is obesity. Despite a prevalence of just over 40% of US adults, several studies show that only 23% to 30% of patients carry a diagnosis of obesity in PCC notes.9,10 The present study evaluated the use of the novel AI platform OpenEvidence (Cambridge, MA) on its CDM capabilities in a PCC using five clinical cases of patients with five common chronic diseases. 11 (HTN, HLD, DM2, depression, and obesity). 11 Despite its widespread use by physicians in over 7000 care centers across the US, there is a paucity of data on its accuracy in CDM. 11
Methods
All patients in this current report have provided general research authorization, which, under our Institutional Review Board (IRB) policy, allows for research focused on a retrospective review of EHR. The current study used AI “off-label” and retrospectively for practice improvement. A protocol was developed by asking our primary care physician specialties (general internal medicine (GIM) and family medicine) to retrospectively identify five patients evaluated in internal or family medicine clinics that posed challenges in the five chronic diseases (HTN, HLD, DM2, depression, and obesity). During the initial visit, these five cases utilized typical decision-making tools (e.g., Uptodate, AskMayoExpert, and textbooks) and physician knowledge without AI platforms (e.g., ChatGPT, Copilot, OpenEvidence). For each of the five cases, the physician who initially evaluated the patient was asked to review the records retrospectively and pose a specific patient-centric point-of-care question (without using protected health information) about the challenging aspect of the case. The physician was then asked to rate the area the AI clinical decision-making assisted in (clinical manifestations, diagnosis, and treatment) and how the application of the AI could have impacted the plan (i.e., modified or reinforced), and if they felt the information and references were accurate based on their knowledge of the common chronic medical conditions. Four physician authors (LAG, CAA, CRS, MSM) who had not previously evaluated the initial case (RTH and JEV) reviewed the cases, questions, responses, and physician ratings. For this review, we utilized a five-question assessment tool in the areas of clarity of recommendations, relevance, evidence-based support, impact on CDM, and overall satisfaction.
OpenEvidence Background: OpenEvidence (OE) is a novel AI-powered platform designed to assist clinicians (with NPI numbers) with a stated goal to provide accurate, evidence-based answers to point-of-care questions. 11 OE was developed by Harvard and MIT researchers who utilized the Mayo Clinic Platform Accelerate program to integrate novel AI technologies in healthcare. According to the company, AI was the first AI system to score above 90% on the United States Medical Licensing Exam (USMLE). 11
Case 1: Hyperlipidemia
The patient is a 37-year-old white male with a relevant past medical history of treated HTN, bipolar disorder, and borderline dyslipidemia, who presented for a physical exam in the ambulatory GIM clinic. He is the former owner of a construction business and had several concerns centered primarily around his potential risk for future chronic diseases, including cardiovascular risk. He had been told previously that he had a history of borderline cholesterol and triglycerides, but this was not being treated by pharmacotherapy. He had been diagnosed with HTN and treated with 10 mg of lisinopril and 12.5 mg of hydrochlorothiazide since 2022. His ambulatory blood pressure measurements at home have been consistently between 110 and 130 systolic/80 and 90 diastolic. He could not recall the last measurement of his cholesterol but had been told that his levels had been “borderline,” but not to the point where medical therapy was initiated. A coronary calcification testing (high resolution, non-contrast, multi-slice CT) performed at another healthcare facility 3 months prior to his visit noted a total calcium score of 17 Agatston Units (AU) with the circumflex artery (10 AU) and right coronary artery (7 AU) showing mild calcifications but placing him between the 90 to 100th percentile for males between 0 and 39 years old.
ECG, chest X-ray, and stress ECG performed at our institution were negative. His lipid panel was as follows: total cholesterol 197 mg/dL, LDL 126 mg/dL, HDL 62 mg/dL, triglycerides 48 mg/dL, lipoprotein (a) 48 nmol/L, and apolipoprotein B 105 mg/dL (desirable <90 mg/s). The remainder of his laboratory work, including complete metabolic profile (CMP), C-reactive protein, and complete blood count, were all normal. His blood pressure on the current ambulatory visit was 120/89, and his body mass index was 27.65 kg/m2.
Physician Plan: Given the elevated cardiovascular risk associated with the coronary calcifications, placing the patient in a greater than 75% age-associated percentile, statin was recommended. The patient was initiated on rosuvastatin 5 mg daily, with plans for a lipid panel and CMP in 3 to 6 months.
Summary of AI (OE) Response: Figure 1a and b present the retrospective question asked and the response generated from OE. The next step in management for this patient should be to initiate moderate-intensity statin therapy to address his elevated cardiovascular risk.
Figure 1.
(a) Question submitted to the OpenEvidence platform for Case 1: hyperlipidemia and (b) OpenEvidence response for Case 1: hyperlipidemia.
Physician Evaluation of AI (OE): The AI reinforced the decision to start a statin due to elevated CV risk. OE summarized the clinical situation for both the physician and the patient. The cited material was accurate and could help the patient and the physician.
Case 2: Hypertension
The patient is a 49-year-old African American male with a past medical history significant for benign essential HTN, morbid obesity, mixed hyperlipidemia, and prediabetes who presents to the clinic for follow-up of his HTN in the Family Medicine PCC. He was diagnosed with HTN in 2023 and is on losartan 100 mg and amlodipine 10 mg. Despite this regimen, his blood pressure readings consistently remained above target, and he was started on 12.5 mg of chlorthalidone 2 months ago. Serum potassium measurements 2 weeks after initiation of chlorthalidone were noted to be 3.1 mmol/L, which only improved to 3.5 mmol/L following supplementation with 20 m Eq of oral potassium tablets daily for 1 week. Furthermore, the potassium levels dropped to 2.9 mmol/L when chlorthalidone was increased to 25 mg, which had to be stopped. A workup for secondary causes of HTN, including renal artery stenosis, primary aldosteronism, and pheochromocytoma, was unremarkable. He reported no current symptoms suggestive of target organ damage, such as chest pain, dyspnea, or headaches. Vitals obtained during the visit included an average of three blood pressure readings of 148/92 mm Hg, heart rate of 85 beats per minute, and BMI of 42.3 kg/m2. The physical examination was unremarkable. The patient was counseled extensively on dietary modifications, including the Dietary Approaches to Stop High Blood Pressure (DASH) diet and increased physical activity.
Physician Plan: Given the persistent elevation in blood pressure despite current therapy, it was decided to initiate spironolactone 25 mg daily as an adjunctive antihypertensive agent. Follow-up for blood pressure monitoring, serum potassium, and renal function was scheduled for 4 weeks. In addition, he was referred to a dietitian for weight management and further evaluation of his cardiovascular risk factors.
Summary of AI (OE) Response: Figure 2a and b present the retrospective question asked and the response generated from OpenEvidence. The next step in the pharmacological management of HTN for this patient, who presents with a blood pressure of 148/92 mmHg despite being on losartan 100 mg and Amlodipine 10 mg, would be to add a mineralocorticoid receptor antagonist (MRA), such as spironolactone.
Figure 2.
(a) Question submitted to the OpenEvidence platform for Case 2: hypertension and (b) OpenEvidence response for Case 2: hypertension.
Physician Evaluation of AI (OE): OE reinforced the physician’s decision to start spironolactone, considering the hypokalemia associated with chlorthalidone. The tool also used appropriate guidelines to make the recommendation.
Case 3: Depression
The patient is a 56-year-old female with a past medical history of benign essential HTN, abdominal aortic aneurysm, peripheral artery disease, status post above-knee amputation of the left lower extremity, and chronic obstructive pulmonary disease (COPD) who presented to the Family Medicine PCC for management of anxiety and depression. She was diagnosed with recurrent major depressive disorder 4 years ago following the amputation of her left lower extremity. Her current medications include bupropion 450 mg daily and sertraline 100 mg daily. She has also been following up with a behavioral health counselor every 2 weeks for additional mental health support.
The patient reports a significant worsening of her depressive symptoms over the past several weeks, driven primarily by increasing pain and discomfort in her lower extremities. She shares that she recently declined an invitation from her brother to join a cruise due to her functional limitations, which exacerbated her feelings of sadness, frustration, and anxiety. Since then, she has experienced increased fatigue, poor sleep, and a diminished appetite. She reports a loss of interest in activities she previously enjoyed and states that her current symptoms have made it difficult to engage in her usual routine. In the office today, her PHQ-9 score was 14, consistent with moderately severe depression, and her GAD-7 score was 11, indicating moderate anxiety. Despite these symptoms, she denies any hallucinations, delusions, or homicidal or suicidal ideations. She attributes her challenges primarily to her physical pain and the emotional toll of her functional status.
Physician Plan: Given the worsening of her depressive symptoms and the impact on her quality of life, the patient was counseled on potential adjustments to her medication regimen, including a trial of increasing the sertraline to 200 mg daily. Her bupropion dose was maintained, given its therapeutic potential benefits on energy and mood. She was also referred to a pain management specialist to address the worsening lower extremity pain, as well as to physical therapy to evaluate potential interventions to improve mobility and reduce discomfort.
Summary of AI (OE) Response: The retrospective question asked, and the response generated from OpenEvidence are presented in Figure 3a and b. Augmenting with mirtazapine is a safe and evidence-based option for this patient and increasing the sertraline dose to 200 mg/day is also appropriate. Shared decision-making with the patient is crucial to determine the best course of action.
Figure 3.
(a) Question submitted to the OpenEvidence platform for Case 3: depression and (b) OpenEvidence response for Case 3: depression.
Physician Evaluation of AI (OE): In this case, the AI tool provided safe and evidence-based management options and cited relevant recent studies, giving the clinician further tools to discuss with the patient.
Case 4: Diabetes Mellitus
The patient is a 56-year-old Hispanic male with a past medical history of DM2 (diagnosed in 2005), essential hypertension, morbid obesity (BMI of 44 kg/m²), and gout who presented to the Family Medicine PCC for follow-up after a recent visit to the emergency room (ER). His DM2 has been managed with metformin 1000 mg twice daily and glipizide 20 mg twice daily. However, over the past few months, the patient reports progressive weight gain, which has raised additional concerns regarding glycemic control and cardiovascular health. Two days before the current office appointment, he presented to the ER with worsening fatigue, occasional dyspnea, and orthopnea. He denied paroxysmal nocturnal dyspnea or peripheral edema. At the time of the ER visit, he was diagnosed with reduced ejection fraction (HFrEF). Labs included a hemoglobin A1c of 8.6%.
Physician Plan: The patient’s weight gain and ongoing poor glycemic control were discussed in the context of his new diagnosis of HFrEF. Given these findings, adjustments to his diabetes management were necessary, prioritizing agents that could address his glucose levels and potential cardiovascular benefits. Glipizide was discontinued, and the patient was started on empagliflozin 10 mg daily, an SGLT2 inhibitor shown to improve glycemic control and offer cardiovascular and renal protection, particularly in patients with HFrEF. Metformin was continued, as his renal function remains stable (eGFR > 60 mL/min/m2).
The patient was extensively counseled on lifestyle modifications, including weight reduction through a low-carbohydrate, heart-healthy diet and increased physical activity tailored to his functional capacity. A referral to a dietitian was initiated for nutritional guidance, and he was encouraged to monitor his blood glucose levels more closely, especially after the initiation of empagliflozin. To address his HFrEF, the patient was advised to follow up with his cardiologist to optimize heart failure management.
Summary of AI (OE) Response: The retrospective question asked, and the response generated from OpenEvidence are presented in Figure 4a and b. The appropriate next step in managing this patient’s type 2 diabetes is to add an SGLT2 inhibitor and consider discontinuing glipizide to optimize both glycemic control and cardiovascular outcomes.
Figure 4.
(a) Question submitted to the OpenEvidence platform for Case 4: diabetes and (b) OpenEvidence response for Case 4: diabetes.
Physician Evaluation of AI (OE): In this case, the AI tool recognized other comorbidities associated with managing a chronic health condition and provided evidence-based recommendations for appropriate management.
Case 5: Obesity
The patient is a 33-year-old white female with a past medical history of recurrent major depressive disorder, generalized anxiety disorder, and morbid obesity (BMI 41.2 kg/m²; height 170 cm, weight 119 kg) who presented to the Family Medicine PCC for follow-up evaluation of weight loss. At her initial consultation 14 months ago, her weight was 117 kg (BMI 40.5 kg/m²). After initiating lifestyle modifications, including dietary changes, consultation with a dietitian, and regular exercise, she achieved a 5 kg weight loss over the first 4 months, bringing her BMI to 39.1 kg/m². Motivated to pursue further weight loss, the patient expressed interest in pharmacotherapy. After a comprehensive evaluation and discussion, a GLP-1 receptor agonist semaglutide (Wegovy©; Novo Nordisk, Bagsvaerd, Denmark) was initiated, as there were no contraindications to its use.
The patient tolerated the semaglutide well and was gradually titrated to the maximum dose of 2.4 mg over the subsequent 6 months. During this period, she continued strength training and adhered to dietary modifications, resulting in an additional 10 kg weight loss, bringing her BMI to 35.6 kg/m². However, her weight loss plateaued over the following 4 months despite being on the maximum dosage of semaglutide (2.4 mg) and maintaining her exercise and dietary regimen.
Physician Plan: The patient presented today expressing frustration with her inability to reduce weight. She remains committed to non-invasive methods of weight management and is not interested in pursuing endoscopic or surgical interventions for obesity. Given that she has plateaued on her weight loss efforts with semaglutide, the decision was made to switch her medication regimen to tirzepatide (Zepbound; Eli Lilly, Indianapolis, IN, USA) in addition to reinforcing the importance of continuing lifestyle modifications and engaging in strength training.
Summary of Response: The retrospective question asked, and the response generated from OpenEvidence are presented in Figure 5a and b. Given the patient’s weight loss plateau on semaglutide and the evidence supporting tirzepatide’s superior efficacy, switching to tirzepatide could be a beneficial strategy for achieving further weight loss.
Figure 5.
(a) Question submitted to the OpenEvidence platform for Case 5: obesity and (b) OpenEvidence response for Case 5: obesity.
Physician Evaluation of AI (OE): In this case, the AI tool reinforced the physician’s decision and provided valuable peer-reviewed resources to share with the patient.
Results
Four physicians who were not involved in the care of the patients independently reviewed the cases and graded them in the five areas (clarity, relevance, support, impact, and satisfaction). Data are summarized in Table 1. Across the five cases, the areas of clarity (3.55 ± 0.21), relevance (3.75 ± 0.43), support (3.35 ± 0.22), and satisfaction (3.60 ± 0.38) were highly rated (scale of 0-4). The area of impact that OE could have on the physician plan was rated low for all five cases, suggesting there was not much additional support given by the AI tool as the plans provided by both the physician and OE were aligned (Table 1).
Table 1.
Physician Scoring. a
| Case ID | Clarity | Relevance | Support | Impact | Satisfaction |
|---|---|---|---|---|---|
| Case #1 | 3.50 | 4.00 | 3.50 | 1.50 | 3.50 |
| Case #2 | 3.75 | 4.00 | 3.00 | 1.75 | 3.75 |
| Case #3 | 3.25 | 3.00 | 3.25 | 2.75 | 3.00 |
| Case #4 | 3.50 | 3.75 | 3.50 | 1.75 | 3.75 |
| Case #5 | 3.75 | 4.00 | 3.50 | 2.00 | 4.00 |
| Mean ± SD | 3.55 ± 0.21 | 3.75 ± 0.43 | 3.35 ± 0.22 | 1.95 ± 0.48 | 3.60 ± 0.38 |
Clarity: on a scale of 0 to 4 where 0 is very unclear and 4 is very clear, indicate how clear was the OE recommendation compared to the physician plan.
Relevance: on a scale of 0 to 4 where 0 is not relevant and 4 is very relevant, indicate how relevant were the OE recommendation to the specific clinical challenges presented in the case.
Support: on a scale of 0 to 4 where 0 is no evidence cited and 4 is extensive evidence cited, indicate how well OE cited and integrated evidence-based guidelines in its recommendation.
Impact: on a scale of 0 to 4 where 0 is no impact and 4 is very significant impact, indicate to what extent OE recommendation have modified or reinforced physician’s original plan.
Satisfaction: on a scale of 0 to 4 where 0 is very dissatisfied and 4 is very satisfied, indicate how satisfied are you with OE performance in addressing the clinical case compared to the physician plan.
a
Each case was reviewed by four physicians. The values presented correspond to the average of the four physician scores.
Discussion
The present study demonstrates that the novel AI platform OE performed well for CDM when presented with clinical scenarios from actual patients in an ambulatory PCC setting. Four physicians independently rated the cases and verified that OE answered all five case-specific clinical questions accurately and informatively, backed by cited literature based on the opinions of physicians who saw the patients in the clinic. One of the main advantages of OE compared to similar AI platforms is that the point-of-care questions are answered from trusted, evidence-based peer-review sources (e.g., PubMed). The AI recommendations were mostly in line with the physician treatment plan and would not have had a major impact on CDM.
OE and other AI-based platforms have the potential to help close the knowledge gap that primary care physicians (PCP) face. Although the present study demonstrated that OE reinforced the CDM by the PCPs in commonly treated conditions, the sheer volume of medical knowledge makes staying up to date regarding management of chronic medical conditions challenging. 12 A recent global survey of Metabolic Dysfunction Associated Steatotic Liver Disease (MASLD) was conducted to assess knowledge among PCPs, hepatologists (HEP), gastroenterologists (GI), and endocrinologists (ENDO). The knowledge scores were significantly higher in HEP, ENDO, and GI physicians in every domain (i.e., diagnosis, epidemiology, and treatment) when compared to PCPs. 12 There are knowledge gaps in PCPs when compared to specialists in other common conditions, such as sexual dysfunction and HLD management.13,14 Despite these knowledge differences, to date, most AI CDM tools have been focused on specific organ or disease problems because of the narrower data elements (i.e., liver enzymes and imaging) needed for the LLMs compared to the undifferentiated nature of PCC patients. 2
Several specific studies have evaluated the role of AI tools in specialty practices outside the PCCs.15,16 A recent study evaluated urologists’ use of an AI tool (Siemens AI Pathway Companion; Siemans Healthcare, Germany) to assist in CDM for ten patients with prostate cancer. 15 The AI tool provided the urologists with an aggregated dashboard of relevant disease-specific information (e.g., prostate-specific antigen, prostate MRI, pathology) leading to diagnostic and therapeutic recommendations. 15 Urologists had a significant reduction in time spent on CDM (−59%; P < .001) and had improved satisfaction when using the tool compared to normal CDM. 15 In an earlier study, twelve GI physicians utilized an institutionally developed AI tool to help with CDM (AI-Optimized) compared to standard processes. 16 The AI-optimized tool saved physicians 18% of the time (P < .02) required to answer clinical questions with no significant decrease in accuracy in CDM compared to standard processes. 16
There are a number of barriers to PCPs’ access to specialty care provided by specialists. 17 A study in pediatric PCCs found that there were significant shortages in both medical and surgical pediatric specialties, with additional barriers such as nonparticipation in insurance plans and lack of acceptance of uninsured patients. 17 A recent study found that Medicaid patients face more challenges in accessing specialty care than commercially insured patients. 18 In this study, Medicaid enrollees had lower access rates to PCPs and some specialists, with the largest gaps observed in rural countries. 18 In the same study, access to nurse practitioners and physician assistants (NPPAs) providing primary care was higher than that of PCPs. 18 A tool such as OE that can help with a wide range of common (as demonstrated by the current study) and complex PCC questions could fill the gap between specialty and primary care, by providing accurate AI-based consults with reliable, evidence-based recommendations.
The present case series has several limitations. OE was used retrospectively, from days to weeks after the patients were seen in the ambulatory PCCs. OE will pull literature based on the time it is accessed. In the present paper the time OE was accessed was different (weeks to 2 months) than when the patient was seen by the physician and CDM. Because of the short time period this likely did not cause differences in CDM but should be acknowledged. The paper draws only a small number of cases (five) from a single institution using common medical conditions seen by PCPs. The accuracy of OE may be altered when used in more complex clinical situations or when there are undifferentiated diagnoses that often present in PCCs. There is a need for more extensive prospective trials using OE involving both complex and standard CDM and comparing accuracy to other AI platforms. Furthermore, there is a need for future studies to assess the performance of OE by evaluating a range of cases with the same primary diagnosis, such as DM or HTN, with different clinical presentations to analyze how the model performs across diverse clinical manifestations of the same condition. We recognized during the data analysis that an additional question rating the physician’s opinion about the quality of the OE references used in the output would potentially improve our tool and will be added in the future. In addition, future trials should include other healthcare team members who work in PCCs, including NPPAs, nurses, pharmacists, and dieticians. Access to OE has previously been limited to those with an NPI number, which has recently been expanded to medical students. We know from experience that other clinicians on our healthcare team use a wide range of CDM references and could benefit from an AI platform that draws from trusted sources.
Conclusions
The present study demonstrates that OpenEvidence (OE), a novel AI-powered platform, performs comparably to physician CDM in clinical scenarios involving common chronic conditions seen in primary care settings. Through retrospective evaluation of five patient cases, OE consistently provided accurate, evidence-based responses that aligned with CDM made by physicians. While OE did not modify CDM in these cases, it reinforced the physician’s plans and offered robust support through citations of peer-reviewed sources. These findings suggest that OE holds promise as an AI tool to augment clinical decision-making, helping to address the knowledge gaps often encountered in primary care compared to specialty practices. Future research is needed to evaluate OE’s utility in more complex and undifferentiated cases, assess its impact on clinical outcomes in prospective trials, and expand its use to multidisciplinary care teams. Such investigations could further establish OE as a valuable resource in improving care delivery and reducing disparities in access to specialized expertise.
Footnotes
Statements and Declarations
Ethical Considerations: Minnesota state law gives patients the right to deny access to their Minnesota medical records for research purposes. These patients provided Minnesota authorization to allow use of their Minnesota medical record information. In addition, these patients were part of a larger study determined to be exempt by the Mayo Clinic Institutional Review Board.
Consent to Participate: Mayo Clinic Institutional Review Board (IRB) reviewed the umbrella study and found it to be exempt, as referenced above, and waived the informed consent.
Author Contributions/CRediT: All the authors participated in the study concept and design, participant enrollment, data collection, analysis and interpretation of data, drafting and revising the paper for important intellectual content, and have reviewed and approved the final version of the manuscript. Each author will take public responsibility for the entire work.
Funding: The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This study was supported in part by Mayo Clinic Department of Medicine, General Internal Medicine and Department of Medicine Research Hub.
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
References
- 1. Kueper JK, Terry AL, Zwarenstein M, Lizotte DJ. Artificial intelligence and primary care research: a scoping review. Ann Fam Med. 2020;18(3):250-258. doi: 10.1370/afm.2518 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 2. Young RA, Martin CM, Sturmberg JP, et al. What complexity science predicts about the potential of artificial intelligence/machine learning to improve primary care. J Am Board Fam Med. 2024;37(2):332-345. doi: 10.3122/jabfm.2023.230219R1 [DOI] [PubMed] [Google Scholar]
- 3. Razai MS, Al-Bedaery R, Bowen L, Yahia R, Chandrasekaran L, Oakeshott P. Implementation challenges of artificial intelligence (AI) in primary care: perspectives of general practitioners in London UK. PLoS ONE. 2024;19(11):e0314196. doi: 10.1371/journal.pone.0314196 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4. Darcel K, Upshaw T, Craig-Neil A, et al. Implementing artificial intelligence in Canadian primary care: barriers and strategies identified through a national deliberative dialogue. PLoS ONE. 2023;18(2):e0281733. doi: 10.1371/journal.pone.0281733 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5. Liu J, Wang C, Liu S. Utility of ChatGPT in clinical practice. J Med Internet Res. 2023;25:e48568. doi: 10.2196/48568 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6. Rao A, Pang M, Kim J, et al. Assessing the utility of ChatGPT throughout the entire clinical workflow: development and usability study. J Med Internet Res. 2023;25:e48659. doi: 10.2196/48659 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7. Ferdush J, Begum M, Hossain ST. ChatGPT and clinical decision support: scope, application, and limitations. Ann Biomed Eng. 2024;52(5):1119-1124. doi: 10.1007/s10439-023-03329-4 [DOI] [PubMed] [Google Scholar]
- 8. Finley CR, Chan DS, Garrison S, et al. What are the most common conditions in primary care? Systematic review. Can Fam Physician. 2018;64(11):832-840. [PMC free article] [PubMed] [Google Scholar]
- 9. Yanovski SZ, Yanovski JA. Approach to obesity treatment in primary care: a review. JAMA Intern Med. 2024;184(7):818-829. doi: 10.1001/jamainternmed.2023.8526 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 10. Iftikhar S, Varayil JE, Hurt RT, Salerno MS, Mueller PS. Determining the rate of obesity documentation in a division of general internal medicine at a tertiary care medical center. Int J Acad Med. 2020;6(2):103-109. doi: 10.4103/Ijam.Ijam_6_20 [DOI] [Google Scholar]
- 11. OpenEvidence. OpenEvidence. 2024. Accessed November 12, 2024. https://www.openevidence.com/
- 12. Younossi ZM, Ong JP, Takahashi H, et al. A global survey of physicians knowledge about nonalcoholic fatty liver disease. Clin Gastroenterol Hepatol. 2022;20(6):e1456-e1468. doi: 10.1016/j.cgh.2021.06.048 [DOI] [PubMed] [Google Scholar]
- 13. Stanley EE, Pfoh E, Lipold L, Martinez K. Gap in sexual dysfunction management between male and female patients seen in primary care: an observational study. J Gen Intern Med. 2025;40:847-853. doi: 10.1007/s11606-024-09004-1 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 14. Langer A, Tan M, Goodman SG, et al. Does management of lipid lowering differ between specialists and primary care: insights from GOAL Canada. Int J Clin Pract. 2021;75(4): e13861. doi: 10.1111/ijcp.13861 [DOI] [PubMed] [Google Scholar]
- 15. Henkel M, Horn T, Leboutte F, et al. Initial experience with AI pathway companion: evaluation of dashboard-enhanced clinical decision making in prostate cancer screening. PLoS ONE. 2022;17(7):e0271183. doi: 10.1371/journal.pone.0271183 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 16. Chi EA, Chi G, Tsui CT, et al. Development and validation of an artificial intelligence system to optimize clinician review of patient records. JAMA Netw Open. 2021;4(7):e2117391. doi: 10.1001/jamanetworkopen.2021.17391 [DOI] [PMC free article] [PubMed] [Google Scholar]
- 17. Pletcher BA, Rimsza ME, Cull WL, Shipman SA, Shugerman RP, O’Connor KG. Primary care pediatricians’ satisfaction with subspecialty care, perceived supply, and barriers to care. J Pediatr. 2010;156(6):1011-1015.e1. doi: 10.1016/j.jpeds.2009.12.032 [DOI] [PubMed] [Google Scholar]
- 18. McConnell KJ, Charlesworth CJ, Zhu JM, et al. Access to primary, mental health, and specialty care: a comparison of medicaid and commercially insured populations in oregon. J Gen Intern Med. 2020;35(1):247-254. doi: 10.1007/s11606-019-05439-z [DOI] [PMC free article] [PubMed] [Google Scholar]
- 19. Grundy SM, Stone NJ, Bailey AL, et al. 2018AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA guideline on the management of blood cholesterol: a report of the American College of Cardiology/American Heart Association Task Force on clinical practice guidelines. J Am Coll Cardiol. 2019;73(24):e285-e350. [DOI] [PubMed] [Google Scholar]
- 20. Whelton PK, Carey RM, Aronow WS, et al. 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults: a report of the American College of Cardiology/American Heart Association Task Force on clinical practice guidelines. Hypertension. 2018;71(6):e13-e115. [DOI] [PubMed] [Google Scholar]
- 21. American Diabetes Association Professional Practice Committee. 10. Cardiovascular disease and risk management: standards of care in diabetes-2024. Diabetes Care. 2024;47(1):S179-S218. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 22. Whelton PK, Carey RM, Aronow WS, et al. 2017 ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA guideline for the prevention, detection, evaluation, and management of high blood pressure in adults: a Report of the American College of Cardiology/American Heart Association task force on clinical practice guidelines. Circulation. 2018; 138(17):e484-e594. [DOI] [PubMed] [[Google Scholar](https://scholar.google.com/scholar_lookup?journal=Circulation&title=2017%20ACC/AHA/AAPA/ABC/ACPM/AGS/APhA/ASH/ASPC/NMA/PCNA%20guideline%20for%20the%20prevention,%20detection,%20evaluation,%20and%20manage