SynGP500: A Clinically-Grounded Synthetic Dataset of Australian General Practice Medical Notes
arxiv.org·1d
🔥DataFusion
Preview
Report Post

View PDF HTML (experimental)

Abstract:We introduce SynGP500, a clinician-curated collection of 500 synthetic Australian general practice medical notes. The dataset integrates curriculum-based clinical breadth (RACGP 2022 Curriculum), epidemiologically-calibrated prevalence (BEACH study), and diverse consultation contexts. This approach systematically includes both common presentations and less-common curriculum-specified conditions that GPs must recognize but appear infrequently in single practice populations, potentially supporting more generalizable model training than datasets constrained by naturally occurring case distributions. SynGP500 is messy by design, reflecting the authentic complexity of hea…

Similar Posts

Loading similar posts...