MedRECT: A Medical Reasoning Benchmark for Error Correction in Clinical Texts
arxiv.org·11h
Flag this post

View PDF HTML (experimental)

Abstract:Large language models (LLMs) show increasing promise in medical applications, but their ability to detect and correct errors in clinical texts – a prerequisite for safe deployment – remains under-evaluated, particularly beyond English. We introduce MedRECT, a cross-lingual benchmark (Japanese/English) that formulates medical error handling as three subtasks: error detection, error localization (sentence extraction), and error correction. MedRECT is built with a scalable, automated pipeline from the Japanese Medical Licensing Examinations (JMLE) and a curated English counterpart, yielding MedRECT-ja (663 texts) and MedRECT-en (458 texts) with comparable error…

Similar Posts

Loading similar posts...