RLHF

Reinforcement Learning from Human Feedback, Alignment, Reward Modeling, Fine-tuning

Feeds to Scour
SubscribedAll
Scoured 70 posts in 17.8 ms

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

馃reinforcement learning, deep learning, machine learningContent type: Academic
arxiv.org
Less-relevant results

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

馃幃Q-LearningContent type: Academic
arxiv.org

Multilingual Refusal Alignment for Safer Large Language Models

馃recommendation systems, LLM, large langurage modelContent type: Academic
arxiv.org

A Regret Minimization Framework on Preference Learning in Large Language Models

馃reinforcement learning, deep learning, machine learningContent type: Academic
arxiv.org

Sequent: scale and automation for higher confidence in alignment

馃reinforcement learning, deep learning, machine learning
lesswrong.com

Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance

馃reinforcement learning, deep learning, machine learningContent type: Academic
arxiv.org

DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment

馃幃Q-LearningContent type: Academic
arxiv.org

SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation

馃幃Q-LearningContent type: Academic
arxiv.org

Beyond Rubrics: Exploration-Guided Evaluation Skills for Reward Modeling

馃recommendation systems, LLM, large langurage modelContent type: Academic
arxiv.org

A Unifying Lens on Reward Uncertainty in RLHF

馃reinforcement learning, deep learning, machine learningContent type: Academic
arxiv.org

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

馃幃Q-LearningContent type: Academic
arxiv.org

What Fits (Into Few Tokens) Doesn't Overfit: Compression and Generalization in ML Research Agents

馃reinforcement learning, deep learning, machine learningContent type: Academic
arxiv.org

Self-evolving LLM agents with in-distribution Optimization

馃幃Q-LearningContent type: Academic
arxiv.org

Korean Culture into LLM Alignment: Toward Cultural Coherence

馃recommendation systems, LLM, large langurage modelContent type: Academic
arxiv.org

What Do People Actually Want From AI? Mapping Preference Plurality

馃reinforcement learning, deep learning, machine learningContent type: Academic
arxiv.org

SkelDPO: A Skeleton-Guided Direct Preference Optimization Framework for Efficient Code Generation

馃recommendation systems, LLM, large langurage modelContent type: Academic
arxiv.org

Substrate Asymmetry in User-Side Memory: A Diagnostic Framework

馃reinforcement learning, deep learning, machine learningContent type: Academic
arxiv.org

Pareto-Guided Teacher Alignment for Fair Personalized Text Generation

馃敜NLPContent type: Academic
arxiv.org

Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO

馃TransformersContent type: Academic
arxiv.org
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help