Revisiting LLM Adaptation for 3D CT Report Generation: A Study of Scaling and Diagnostic Priors (opens in new tab)

Recent advances in multimodal learning, including large language models (LLMs) and vision-language models (VLMs), have demonstrated strong adaptability to natural images. However, extending their use to the medical domain, particularly for volumetric (3D) images, is challenging due to high computational complexity, volumetric dependencies and the semantic gap between visual features and clinical terminology. Naively fine-tuning LLMs on limited m...

Read the original article