Challenges of Evaluating LLM Safety for User Welfare
arxiv.org·3d
🔐Capability Systems
Preview
Report Post

Title:Challenges of Evaluating LLM Safety for User Welfare

View PDF HTML (experimental)

Abstract:Safety evaluations of large language models (LLMs) typically focus on universal risks like dangerous capabilities or undesirable propensities. However, millions use LLMs for personal advice on high-stakes topics like finance and health, where harms are context-dependent rather than universal. While frameworks like the OECD’s AI classification recognize the need to assess individual risks, user-welfare safety evaluations remain underdeveloped. We argue that developing such evaluations is non-trivial due to fundamental questions about accounting for user context in evaluation design. In this exploratory stud…

Similar Posts

Loading similar posts...