Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎯 Reinforcement Learning
RL, RLHF, reward models, policy optimization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
405
posts in
7.9
ms
Reasoning
RL
in 2026: GRPO, DPO, RLVR,
Agentic
PO
& Beyond
🧠
Reasoning Models
turingpost.com
·
3d
3 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
Flow-DPPO
: Divergence Proximal
Policy
Optimization
for Flow Matching Models
🌐
World Models
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models
How to Implement a
Model-Free
RL
Algorithm: A Step-by-Step Guide
🌐
World Models
Content type:
Blog
ujangriswanto08.medium.com
·
2h
2 hours ago
Actions for How to Implement a Model-Free RL Algorithm: A Step-by-Step Guide
Researchers develop AI-powered railway control system for efficient urban train operation
⚖️
AI Governance
techxplore.com
·
18h
18 hours ago
Actions for Researchers develop AI-powered railway control system for efficient urban train operation
Q-Learning
(
Reinforcement
learning
): Bellman Equation, Markov Decision Processes,
Q-Values
, and…
💾
Agent Memory
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…
Reinforcement
Learning
and
Optimal
Control Book (RIP Dimitri Bertsekas)
🌐
World Models
Content type:
Academic
web.mit.edu
·
5d
5 days ago
·
Hacker News
Actions for Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)
Plan-and-Verify Video
Reward
Reasoning with Spatio-Temporal Scene Graph Grounding
🧠
Reasoning Models
Content type:
Academic
arxiv.org
·
3h
3 hours ago
Actions for Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Grounding
Variational Proximal
Policy
Optimization
🔬
AI Research
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Variational Proximal Policy Optimization
Fast and Highly Expressive
Policy
Learning
for Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
💾
Agent Memory
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
Performance Variation in Deep
Reinforcement
Learning
⚡
Inference
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Performance Variation in Deep Reinforcement Learning
A Unifying Lens on
Reward
Uncertainty in
RLHF
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for A Unifying Lens on Reward Uncertainty in RLHF
Geometrically Averaged Hard Target Updates for Linear
Q-Learning
⚡
Inference
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Geometrically Averaged Hard Target Updates for Linear Q-Learning
Uncertainty-Aware LLM-Guided
Policy
Shaping for
Sparse-Reward
Reinforcement
Learning
🤖
AI Agents
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning
3SPO: State-Score-Supervised
Policy
Optimization
for LLM
Agents
🤖
AI Agents
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for 3SPO: State-Score-Supervised Policy Optimization for LLM Agents
Rethinking the Divergence Regularization in LLM
RL
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Rethinking the Divergence Regularization in LLM RL
On Advantage Estimates for Max@K
Policy
Gradients
⚡
Inference
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for On Advantage Estimates for Max@K Policy Gradients
The Neutral Mask: How
RLHF
Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language
Model
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
UNIQ: Conformal Calibration for Adaptive Conservatism in Offline
Reinforcement
Learning
⚡
Inference
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning
Representation
Learning
Enables Scalable
Multitask
Deep
Reinforcement
Learning
🌐
World Models
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Representation Learning Enables Scalable Multitask Deep Reinforcement Learning
DriveReward: A Comprehensive Dataset and Generative Vision-Language
Reward
Model
for Autonomous Driving
👁️
Multimodal AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help