Small Models Achieve Large Language Model Performance: Evaluating Reasoning-Enabled AI for Secure Child Welfare Research
arxiv.org·5d
🔍Type Inference
Preview
Report Post

View PDF

Abstract:Objective: This study develops a systematic benchmarking framework for testing whether language models can accurately identify constructs of interest in child welfare records. The objective is to assess how different model sizes and architectures perform on four validated benchmarks for classifying critical risk factors among child welfare-involved families: domestic violence, firearms, substance-related problems generally, and opioids specifically. Method: We constructed four benchmarks for identifying risk factors in child welfare investigation summaries: domestic violence, substance-related problems, firearms, and opioids (n=500 each). We evaluated seven model sizes (0.6B-32B parameters) in standard and extended reasoning …

Similar Posts

Loading similar posts...