🤖 LLMs - machacek.vitek · Scour

Principled Agent Debate: Adversarial Arbitration for Sycophancy Reduction in Large Language Models

🤖AI Academic

Less-relevant results

Neglected Basics of AI Alignment

lesswrong.com·

EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms

⚙️C++ Academic

Sequential Data Poisoning in LLM Post-Training

🤖AI Academic

Do We Want a Superintelligent People-Pleaser?

🎯Career Growth

lesswrong.com·

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

🌐Distributed Systems Academic

Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling

🤖AI Academic

Log in to enable infinite scrolling