AutoMedic: An Automated Evaluation Framework for Clinical Conversational Agents with Medical Dataset Grounding
arxiv.org·3d
Proof Automation
Preview
Report Post

View PDF HTML (experimental)

Abstract:Evaluating large language models (LLMs) has recently emerged as a critical issue for safe and trustworthy application of LLMs in the medical domain. Although a variety of static medical question-answering (QA) benchmarks have been proposed, many aspects remain underexplored, such as the effectiveness of LLMs in generating responses in dynamic, interactive clinical multi-turn conversation situations and the identification of multi-faceted evaluation strategies beyond simple accuracy. However, formally evaluating a dynamic, interactive clinical situation is hindered by its vast combinatorial space of possible patient states and interaction trajectories, making it diffic…

Similar Posts

Loading similar posts...