RLHF

Feeds to Scour
SubscribedAll
Scoured 180 posts in 8.8 ms

DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment

 🎯Fine-Tuning  Content type: Academic
arxiv.org·

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

What Do People Actually Want From AI? Mapping Preference Plurality

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

Hidden Consensus:Preference-Validity Compression in Human Feedback

 🎯Fine-Tuning  Content type: Academic
arxiv.org·

RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

Multilingual Refusal Alignment for Safer Large Language Models

 🎯Fine-Tuning  Content type: Academic
arxiv.org·

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

TLA-Prover: Verifiable TLA+ Specification Synthesis via Preference-Optimized Low-Rank Adaptation

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

Pareto-Guided Teacher Alignment for Fair Personalized Text Generation

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

Emergence of Context Characteristics Sensitivity in Large Language Models

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

PriFT: Prior-Support Guided Supervised Fine-Tuning

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

Better Literary Translation: A Multi-Aspect Data Generation and LLM Training Approach

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

Gradient-Guided Reward Optimization for Inference-time Alignment

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

On the Geometry of On-Policy Distillation

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

GRAIL: Gradient-Reweighted Advantages for Reinforcement Learning with Verifiable Rewards

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

Belief-Space Quantum-Inspired Reinforcement Learning for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help