Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
RLHF
🎯 RLHF
Specific
Reinforcement Learning, Human Feedback, LLM Alignment
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
142
posts in
10.1
ms
Sequential Data Poisoning in
LLM
Post-Training
🎛️
Fine-tuning
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Sequential Data Poisoning in LLM Post-Training
Tracing Eval-Awareness Emergence Through Training of OLMo 3
🎛️
Fine-tuning
lesswrong.com
·
7h
7 hours ago
Actions for Tracing Eval-Awareness Emergence Through Training of OLMo 3
Measuring Embedding Drift: Why Hybrid Search Saves Stale
Models
.
🎯
Fine-Tuning
pub.towardsai.net
·
12h
12 hours ago
Actions for Measuring Embedding Drift: Why Hybrid Search Saves Stale Models.
A Unifying Lens on
Reward
Uncertainty in
RLHF
🎮
Reinforcement Learning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A Unifying Lens on Reward Uncertainty in RLHF
Sequent: scale and automation for higher confidence in
alignment
🎯
Fine-Tuning
lesswrong.com
·
2h
2 hours ago
Actions for Sequent: scale and automation for higher confidence in alignment
The Neutral Mask: How
RLHF
Provides Shallow
Alignment
while Leaving Partisan Structure Intact in a Large Language
Model
🎯
Fine-Tuning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
BiasGRPO: Stabilizing Bias Mitigation in High-Variance
Reward
Landscapes via Group-Relative
Policy
Optimization
🎯
Fine-Tuning
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization
Mult-DPO
: Multinomial
Direct
Preference
Optimization for Recommender Systems
🎛️
Fine-tuning
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems
Variational
Proximal
Policy
Optimization
🎮
Reinforcement Learning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Variational Proximal Policy Optimization
Representation-Aware Advantage Estimation: Your
Reward
Model
Provides More Than A Scalar Output
🎮
Reinforcement Learning
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output
Multilingual Sentiment Aware Text Summarization A
Reinforcement
Learning
Approach for Consistency Maintenance
🎯
Fine-Tuning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
Mechanistic Analysis of
Alignment
Algorithms in Language
Models
🎯
Fine-Tuning
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for Mechanistic Analysis of Alignment Algorithms in Language Models
Sparse Mixture-of-Experts
Reward
Models
Learn
Interpretable and Specialized Experts for Personalized Preference Modeling
🎛️
Fine-tuning
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling
A Regret Minimization Framework on
Preference
Learning
in Large Language
Models
🎮
Reinforcement Learning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A Regret Minimization Framework on Preference Learning in Large Language Models
Alignment
Defends LLMs from Property Inference Attacks
🎯
Fine-Tuning
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for Alignment Defends LLMs from Property Inference Attacks
DriveReward: A Comprehensive Dataset and Generative Vision-Language
Reward
Model
for Autonomous Driving
🎮
Reinforcement Learning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving
EvalStop: Using World
Feedback
to Detect and Correct
Reward
Overoptimization in Multi-Tenant
RLHF
Platforms
🎛️
Fine-tuning
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms
When
RL
Fails after
SFT
: Rejuvenating
Model
Plasticity for Robust
SFT-to-RL
Handoff
🎛️
Fine-tuning
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff
RASFT: Rollout-Adaptive
Supervised
Fine-Tuning
for Reasoning
🎛️
Fine-tuning
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for RASFT: Rollout-Adaptive Supervised Fine-Tuning for Reasoning
A Unifying Lens on
Supervised
Fine-Tuning
Through Target Distribution Design
🎛️
Fine-tuning
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help