General-purpose chatbots outperform clinical AI tools on physicians’ real-world questions (opens in new tab)

Specialized clinical AI tools are entering medical practice with little independent testing. In a head-to-head evaluation across two public benchmarks and real questions from physicians, three general-purpose frontier large language models outperformed two leading clinical AI tools, which performed no better than Google search AI overview. The problem Generative artificial intelligence (AI) has rapidly entered medicine. Hospitals, payers and individual clinicians now have access to two broad ...

Read the original article