Error Correction in Radiology Reports: A Knowledge Distillation-Based Multi-Stage Framework

View PDF HTML (experimental)

Abstract:The increasing complexity and workload of clinical radiology leads to inevitable oversights and mistakes in their use as diagnostic tools, causing delayed treatments and sometimes life-threatening harm to patients. While large language models (LLMs) have shown remarkable progress in many tasks, their utilities in detecting and correcting errors in radiology reporting are limited. This paper proposes a novel dual-knowledge infusion framework that enhances LLMs’ capability for radiology report proofreading through systematic integration of medical expertise. Specifically, the knowledge infusion combines medical knowledge graph distillation (MKGD) with external knowledg…

View PDF HTML (experimental)

Abstract:The increasing complexity and workload of clinical radiology leads to inevitable oversights and mistakes in their use as diagnostic tools, causing delayed treatments and sometimes life-threatening harm to patients. While large language models (LLMs) have shown remarkable progress in many tasks, their utilities in detecting and correcting errors in radiology reporting are limited. This paper proposes a novel dual-knowledge infusion framework that enhances LLMs’ capability for radiology report proofreading through systematic integration of medical expertise. Specifically, the knowledge infusion combines medical knowledge graph distillation (MKGD) with external knowledge retrieval (EXKR), enabling an effective automated approach in tackling mistakes in radiology reporting. By decomposing the complex proofreading task into three specialized stages of detection, localization, and correction, our method mirrors the systematic review process employed by expert radiologists, ensuring both precision and clinical interpretability. To perform a robust, clinically relevant evaluation, a comprehensive benchmark is also proposed using real-world radiology reports with real-world error patterns, including speech recognition confusions, terminology ambiguities, and template-related inconsistencies. Extensive evaluations across multiple LLM architectures demonstrate substantial improvements of our approach: up to 31.56% increase in error detection accuracy and 37.4% reduction in processing time. Human evaluation by radiologists confirms superior clinical relevance and factual consistency compared to existing approaches.


Comments:	Accepted to AAAI 2026
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2406.15045 [cs.CL]
	(or arXiv:2406.15045v3 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.15045 arXiv-issued DOI via DataCite

Submission history

From: Jinge Wu [view email] [v1] Fri, 21 Jun 2024 10:48:21 UTC (265 KB) [v2] Tue, 17 Sep 2024 18:57:49 UTC (631 KB) [v3] Wed, 12 Nov 2025 02:55:25 UTC (262 KB)

Submission history

Similar Posts