🤖 AI - clasnake · Scour

Principled Agent Debate: Adversarial Arbitration for Sycophancy Reduction in Large Language Models

🕹️AI Agents Academic

Do We Want a Superintelligent People-Pleaser?

🤖Claude Code

lesswrong.com·

EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms

🎯Product Development Academic

Sequential Data Poisoning in LLM Post-Training

🛠️LLM Tooling Academic

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

🕹️AI Agents Academic

Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling

🕹️AI Agents Academic

Log in to enable infinite scrolling