Abstract:Language models (LMs) have demonstrated expert-level reasoning and recall abilities in medicine. However, computational costs and privacy concerns are mounting barriers to wide-scale implementation. To address these significant limitations, we introduce a parsimonious adaptation of phi-3-mini, MedMobile, a 3.8 billion parameter LM capable of running on a mobile device, for medical applications. We perform a careful set of pipeline additions and demonstrate that chain of thought, ensembling, and fine-tuning lead to the greatest performance gains, while unexpectedly retrieval augmented generation fails to demonstrate significant improvements. We evaluate the efficiency of our pipeline on the MultiMedQA and MedBullets. We demons…
Abstract:Language models (LMs) have demonstrated expert-level reasoning and recall abilities in medicine. However, computational costs and privacy concerns are mounting barriers to wide-scale implementation. To address these significant limitations, we introduce a parsimonious adaptation of phi-3-mini, MedMobile, a 3.8 billion parameter LM capable of running on a mobile device, for medical applications. We perform a careful set of pipeline additions and demonstrate that chain of thought, ensembling, and fine-tuning lead to the greatest performance gains, while unexpectedly retrieval augmented generation fails to demonstrate significant improvements. We evaluate the efficiency of our pipeline on the MultiMedQA and MedBullets. We demonstrate that MedMobile scores 75.7% on the MedQA (USMLE), surpassing the passing mark for licensed physicians (~60%) and rivaling scores of models 100 times its size. Across the entirety of the MultiMedQA, MedMobile achieves SOTA performance for models with less than 5B parameters and represents the smallest model to pass the MedQA (USMLE). MedMobile holds promise to democratize access to language models in medicine, bolstering lower compute needs and fast inference speeds. With the ability to combat the biggest barriers to entry for language models in medicine, we hope that MedMobile is a critical step forward in developing clinically relevant language models.
| Comments: | 24 pages, 7 figures (4 main, 3 supplementary) |
| Subjects: | Computation and Language (cs.CL) |
| Cite as: | arXiv:2410.09019 [cs.CL] |
| (or arXiv:2410.09019v2 [cs.CL] for this version) | |
| https://doi.org/10.48550/arXiv.2410.09019 arXiv-issued DOI via DataCite |
Submission history
From: Krithik Vishwanath Mr. [view email] [v1] Fri, 11 Oct 2024 17:32:59 UTC (602 KB) [v2] Wed, 12 Nov 2025 21:09:25 UTC (989 KB)