Reinforcement Learning from Human Feedback

Feeds to Scour
SubscribedAll
Scoured 514 posts in 7.3 ms

Introducing the Google Colab CLI

 ⚙️ML Systems  Content type: Blog

Training LLMs to Enforce Multi-Level Instruction Hierarchies via Gravity-Weighted Direct Preference Optimization

 ⚙️ML Systems  Content type: Academic
arxiv.org·

Robust Multi-Mutant Protein Stability Prediction from a Fine-Tuned Evolutionary Scale Model

 🧠Deep Learning  Content type: Academic
biorxiv.org·

PriFT: Prior-Support Guided Supervised Fine-Tuning

 🎮RL  Content type: Academic
arxiv.org·

Stack Overflow didn't just help AI learn to code

 🤖ML

A Unifying Lens on Reward Uncertainty in RLHF

 🎮RL  Content type: Academic
arxiv.org·

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

 🎮RL  Content type: Academic
arxiv.org·

Attention Amnesia in Hybrid LLMs: When CoT Fine-Tuning Breaks Long-Range Recall, and How to Fix It

 🧠Deep Learning  Content type: Academic
arxiv.org·

Beyond Rubrics: Exploration-Guided Evaluation Skills for Reward Modeling

 🎮RL  Content type: Academic
arxiv.org·

Optimisation over non-stationary distributions creates weirder minds

 🎮RL
lesswrong.com·

Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance

 🎮RL  Content type: Academic
arxiv.org·

Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning

 🤖ML  Content type: Academic
arxiv.org·

pLM-Guided Inverse Folding for Antibody Sequence Design

 🛠️Systems Programming  Content type: Academic
biorxiv.org·

A Finetuned SpeechLLM for Joint Multi-Granular L2 Assessment and Natural-Language Rationales

 ⚙️ML Systems  Content type: Academic
arxiv.org·

Emergence of Context Characteristics Sensitivity in Large Language Models

 🎮RL  Content type: Academic
arxiv.org·

The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring

 ⚙️ML Systems  Content type: Academic
arxiv.org·

Training Deliberative Monitors for Black-Box Scheming Detection

 🎮RL
lesswrong.com·

Alignment Defends LLMs from Property Inference Attacks

 🎮RL  Content type: Academic
arxiv.org·

PAFO: Pareto Fairness Optimization for Personalized Reward Modeling

 🎮RL  Content type: Academic
arxiv.org·

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

 🎮RL  Content type: Academic
arxiv.org·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help