RLHF

Reinforcement Learning from Human Feedback, Alignment, Reward Modeling, Fine-tuning

Feeds to Scour
SubscribedAll
Scoured 70 posts in 6.6 ms

Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond

馃reinforcement learning, deep learning, machine learning
turingpost.com

Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems

馃摎Information RetrievalContent type: Academic
arxiv.org

How ChatGPT Actually Works (Beginner Friendly)

馃reinforcement learning, deep learning, machine learningContent type: Blog
medium.com

Tracing Eval-Awareness Emergence Through Training of OLMo 3

鉁嶏笍Prompt Engineering
lesswrong.com

How LLMs are Actually Trained

馃reinforcement learning, deep learning, machine learningContent type: NewsContent type: Blog
blog.algomaster.io

SLUUG Talk: Demystifying Large Language Models on Linux

馃reinforcement learning, deep learning, machine learningContent type: Code
github.comDEV
Less-relevant results

Don't let the LLM speak, just probe it (8 minute read)

鉁嶏笍Prompt EngineeringContent type: Blog
blog.j11y.ioHacker News

The week AI infrastructure crossed from a technology story to a financial one

鉁嶏笍Prompt EngineeringContent type: News
mlwhiz.com

Would a prepaid pass for a coding agent solve a real need or is it just my itch?

馃recommendation systems, LLM, large langurage model
codehamr.comr/SideProject

Analyzing and Improving Fine-grained Preference Optimization in Medical LVLMs

馃recommendation systems, LLM, large langurage modelContent type: Academic
arxiv.org

I built a machine that turns AI papers into interactive explainers

鉁嶏笍Prompt EngineeringContent type: Blog
blog.skz.dev

Researchers develop AI-powered railway control system for efficient urban train operation

馃reinforcement learning, deep learning, machine learning
techxplore.com

Alignment Defends LLMs from Property Inference Attacks

馃reinforcement learning, deep learning, machine learningContent type: Academic
arxiv.org

Posting for authoring

鉁嶏笍Prompt Engineering
turingpost.com

umair-tareen/philosopher-council: An eleven-philosopher LLM council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.

馃reinforcement learning, deep learning, machine learningContent type: Code
github.comr/SideProject

local AI agents for Cursor with pre-tuned marketplace/commu

鉁嶏笍Prompt Engineering
locaible.comHacker News

Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI

馃reinforcement learning, deep learning, machine learningContent type: Blog
aws.amazon.com

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help