RLHF

reinforcement learning human feedback, preference learning, reward modeling, DPO

Feeds to Scour
SubscribedAll
Scoured 85 posts in 35.1 ms

Sequential Data Poisoning in LLM Post-Training

 🤖LLMs  Content type: Academic
arxiv.org·

Why LLMs (still) lack taste

 🤖LLMs

Tracing Eval-Awareness Emergence Through Training of OLMo 3

 🎯LLM Finetuning
lesswrong.com·

Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond

 🎯LLM Finetuning
turingpost.com·
Less-relevant results

The Non Profit Association Delivering Future Collaborative Opensource Tools for Energy System Simulation

 🧪Synthetic Data
cresym.eu·

TRL: GIVE EVERYBODY IN SCOTLAND A SHOVEL

 🎯LLM Finetuning  Content type: Blog
channel-6.ghost.io
·

Google Colab CLI opens runtimes to Claude Code and Codex

 🎯LLM Finetuning

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

 Continuous Batching  Content type: Code
github.com··Hacker News

AI Paper Review: Training Language Models to Follow Instructions with Human Feedback (InstructGPT)

 💬Natural Language Processing
freecodecamp.org·

Deep Learning Weekly: Issue 458

 🤖AI Agents

Sequent: scale and automation for higher confidence in alignment

 🛡️AI Safety
lesswrong.com·

Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems

 🎯LLM Finetuning  Content type: Academic
arxiv.org·

Nvidia Nemotron 3 Ultra

 🗣️NLP

Posting for authoring

 🎯LLM Finetuning
turingpost.com·

local AI agents for Cursor with pre-tuned marketplace/commu

 🏢LLM Adoption
locaible.com··Hacker News

Show HN: The Deterministic Core Architecture for AI-Augmented Applications

 🗂️RAG Systems

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

 🤖LLMs  Content type: Academic
arxiv.org·

From 1 July, the AP will check the registration of scan cars in the algorithm register

 🎯LLM Finetuning

Aviation’s non-CO2 impacts on the climate 2026

 🎯LLM Finetuning
ukri.org·

EDPB meets with EU Commissioner McGrath and adopts common data breach notification template

 🎯LLM Finetuning
edpb.europa.eu·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help