🤖 AI - krks.gbr · Scour

What Do People Actually Want From AI? Mapping Preference Plurality

💬LLMs Academic

Do We Want a Superintelligent People-Pleaser?

lesswrong.com·

EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms

💬LLMs Academic

Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling

💬LLMs Academic

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

💬LLMs Academic

Sequential Data Poisoning in LLM Post-Training

💬LLMs Academic

Log in to enable infinite scrolling