RLHF

Reinforcement Learning from Human Feedback, Alignment, Reward Models

Feeds to Scour
SubscribedAll
Scoured 168 posts in 5.8 ms

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

 🎮Reinforcement Learning  Content type: Academic
arxiv.org·

Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond

 🎮Reinforcement Learning
turingpost.com·

Tracing Eval-Awareness Emergence Through Training of OLMo 3

 🎯Fine-tuning
lesswrong.com·
Less-relevant results

ApodexAI/AgentHarness: Evaluation harness for Apodex-1.0 on public deep-research benchmarks.

 🧠OpenAI  Content type: Code
github.com··Hacker News

Nvidia Nemotron 3 Ultra

 🤖AI

Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information

 🎮Reinforcement Learning

Why Claude Produces High-Quality Output: A Developer’s Guide to Token Efficiency and Hallucination…

 🎭Anthropic Claude  Content type: Blog
medium.com·

SFT & the Locus Awards

 Gemini
sfintranslation.com·

DiffusionGemma: The Developer Guide

 🎯Fine-tuning  Content type: Blog

From 1 July, the AP will check the registration of scan cars in the algorithm register

 🎯Fine-tuning

How to reduce capability degradation from off-model SFT

 ✍️Prompt Engineering
lesswrong.com·

local AI agents for Cursor with pre-tuned marketplace/commu

 🧠OpenAI
locaible.com··Hacker News

Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems

 🎯Fine-tuning  Content type: Academic
arxiv.org·

Stack Overflow didn't just help AI learn to code

 🤖AI

AI2's Nathan Lambert says Nvidia's multi-teacher on-policy distillation for Nemotron 3 Ultra is the post-training industry standard

 🧠OpenAI
digg.com·

Why LLMs (still) lack taste

 🤖AI

GDPR request

 🎯Fine-tuning

Vibe Diaries: Training Nanochat

 🔤Tokenization
vibediary.dev··Hacker News

Deep Learning Weekly: Issue 458

 🤖Agent

A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design

 🤖LLM  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help