Can Post-Training Turn LLMs into Good Medical Coders? An Empirical Study of Generative ICD Coding (opens in new tab)

Automated International Classification of Diseases (ICD) coding is a core medical-coding task for billing, epidemiology, and clinical decision support. Generative large language models (LLMs) are often reported as weak medical coders, but this finding mainly comes from inference-time settings such as prompting, retrieval, reranking, or tool use, leaving the role of task-specific post-training underexplored. We present a controlled empirical stud...

Read the original article