Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎯 Reinforcement Learning
RL, RLHF, reward models, policy optimization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
449
posts in
4.7
ms
Reasoning
RL
in 2026: GRPO, DPO, RLVR,
Agentic
PO
& Beyond
🧠
Reasoning Models
turingpost.com
·
3d
3 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
Flow-DPPO
: Divergence Proximal
Policy
Optimization
for Flow Matching Models
🌐
World Models
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models
How to Implement a
Model-Free
RL
Algorithm: A Step-by-Step Guide
🌐
World Models
Content type:
Blog
ujangriswanto08.medium.com
·
3h
3 hours ago
Actions for How to Implement a Model-Free RL Algorithm: A Step-by-Step Guide
Researchers develop AI-powered railway control system for efficient urban train operation
⚖️
AI Governance
techxplore.com
·
20h
20 hours ago
Actions for Researchers develop AI-powered railway control system for efficient urban train operation
Q-Learning
(
Reinforcement
learning
): Bellman Equation, Markov Decision Processes,
Q-Values
, and…
💾
Agent Memory
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…
Reinforcement
Learning
and
Optimal
Control Book (RIP Dimitri Bertsekas)
🌐
World Models
Content type:
Academic
web.mit.edu
·
5d
5 days ago
·
Hacker News
Actions for Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)
Deterministic
Policy
Gradient
for
Learning
Equilibrium in Time-Inconsistent Control Problems
🌐
World Models
Content type:
Academic
arxiv.org
·
5h
5 hours ago
Actions for Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems
Variational Proximal
Policy
Optimization
🔬
AI Research
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Variational Proximal Policy Optimization
Reinforcement
Learning
Disrupts
Gradient-Based
Adversarial Optimization
🌐
World Models
Content type:
Academic
arxiv.org
·
5h
5 hours ago
Actions for Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization
Fast and Highly Expressive
Policy
Learning
for Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
💾
Agent Memory
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
APPO:
Agentic
Procedural
Policy
Optimization
💻
AI Coding
Content type:
Academic
arxiv.org
·
5h
5 hours ago
Actions for APPO: Agentic Procedural Policy Optimization
Performance Variation in Deep
Reinforcement
Learning
⚡
Inference
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Performance Variation in Deep Reinforcement Learning
A Unifying Lens on
Reward
Uncertainty in
RLHF
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for A Unifying Lens on Reward Uncertainty in RLHF
PAWS: Preference
Learning
with Advantage-Weighted Segments
🧠
Reasoning Models
Content type:
Academic
arxiv.org
·
5h
5 hours ago
Actions for PAWS: Preference Learning with Advantage-Weighted Segments
Geometrically Averaged Hard Target Updates for Linear
Q-Learning
⚡
Inference
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Geometrically Averaged Hard Target Updates for Linear Q-Learning
Plan-and-Verify Video
Reward
Reasoning with Spatio-Temporal Scene Graph Grounding
🧠
Reasoning Models
Content type:
Academic
arxiv.org
·
5h
5 hours ago
Actions for Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Grounding
Uncertainty-Aware LLM-Guided
Policy
Shaping for
Sparse-Reward
Reinforcement
Learning
🤖
AI Agents
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning
3SPO: State-Score-Supervised
Policy
Optimization
for LLM
Agents
🤖
AI Agents
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for 3SPO: State-Score-Supervised Policy Optimization for LLM Agents
Space-sampled
Value
Decay: Forgetting Mechanisms for Non-stationary Deep
Reinforcement
Learning
💾
Agent Memory
Content type:
Academic
arxiv.org
·
5h
5 hours ago
Actions for Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep Reinforcement Learning
Rethinking the Divergence Regularization in LLM
RL
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Rethinking the Divergence Regularization in LLM RL
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help