RLHF

Reinforcement Learning from Human Feedback, Reward Modeling, Preference Learning, Alignment

Feeds to Scour
SubscribedAll
Scoured 76 posts in 7.2 ms

Emergence of Context Characteristics Sensitivity in Large Language Models

 post training infra  Content type: Academic
arxiv.org·

Variational Proximal Policy Optimization

 🎭Mixture of Experts  Content type: Academic
arxiv.org·

DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment

 post training infra  Content type: Academic
arxiv.org·
Less-relevant results

Turkish Navy Confirms 2032 Delivery Date for MUGEM Aircraft Carrier

 post training infra
navalnews.com·

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

 post training infra  Content type: Academic
arxiv.org·

PayPal and Hey Savi Launch UK’s First Agentic Commerce Platform, Debenhams Group Signs On

 🤖agentic system
easternherald.com·

(VERY PARTIAL) CROSSPOST: ALEX HEATH: SubStack Is Opening Up to AI: Interviewing CEO Chris Best

 post training infra  Content type: News  Content type: Blog

Multilingual Refusal Alignment for Safer Large Language Models

 📊LLM Evaluation  Content type: Academic
arxiv.org·

PAFO: Pareto Fairness Optimization for Personalized Reward Modeling

 post training infra  Content type: Academic
arxiv.org·

Substrate Asymmetry in User-Side Memory: A Diagnostic Framework

 🎛️Fine-Tuning  Content type: Academic
arxiv.org·

Harmfulness Directions in OLMo

 post training infra
lesswrong.com·

A Finetuned SpeechLLM for Joint Multi-Granular L2 Assessment and Natural-Language Rationales

 📊LLM Evaluation  Content type: Academic
arxiv.org·

TLA-Prover: Verifiable TLA+ Specification Synthesis via Preference-Optimized Low-Rank Adaptation

 post training infra  Content type: Academic
arxiv.org·

The Shibboleth Effect: Auditing the Cross-Lingual Distributional Skew of Large Language Models

 post training infra  Content type: Academic
arxiv.org·

Emergent alignment and the projectability of ethical personas

 🎛️Fine-Tuning  Content type: Academic
arxiv.org·

What Do People Actually Want From AI? Mapping Preference Plurality

 post training infra  Content type: Academic
arxiv.org·

Mechanistic Analysis of Alignment Algorithms in Language Models

 post training infra  Content type: Academic
arxiv.org·

Korean Culture into LLM Alignment: Toward Cultural Coherence

 post training infra  Content type: Academic
arxiv.org·

Hidden Consensus:Preference-Validity Compression in Human Feedback

 post training infra  Content type: Academic
arxiv.org·

Better Literary Translation: A Multi-Aspect Data Generation and LLM Training Approach

 post training infra  Content type: Academic
arxiv.org·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help