Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
Q-Learning, Policy Gradient, Reward Systems, Game AI, Robotics
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
419
posts in
4.6
ms
Policy
Gradient
for Continuous-Time Robust Markov Decision Processes
🔗
Markov Chains
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Policy Gradient for Continuous-Time Robust Markov Decision Processes
Researchers develop
AI-powered
railway control
system
for efficient urban train operation
🤖
Machine Learning
techxplore.com
·
9h
9 hours ago
Actions for Researchers develop AI-powered railway control system for efficient urban train operation
Q-Learning
(
Reinforcement
learning
): Bellman Equation, Markov Decision Processes, Q-Values, and…
🔗
Markov Chains
Content type:
Blog
medium.com
·
1d
1 day ago
Actions for Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…
Reinforcement
Learning
and
Optimal
Control Book (RIP Dimitri Bertsekas)
🤖
Machine Learning
Content type:
Academic
web.mit.edu
·
5d
5 days ago
·
Hacker News
Actions for Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)
Reasoning
RL
in 2026: GRPO, DPO, RLVR, Agentic
PO
& Beyond
🤖
AI
turingpost.com
·
3d
3 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
Fast and Highly Expressive
Policy
Learning
for Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
🔢
TensorFlow
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
Variational
Proximal
Policy
Optimization
🤖
Machine Learning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Variational Proximal Policy Optimization
Geometrically Averaged Hard Target Updates for Linear
Q-Learning
🤖
Machine Learning
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Geometrically Averaged Hard Target Updates for Linear Q-Learning
UNIQ: Conformal Calibration for Adaptive Conservatism in Offline
Reinforcement
Learning
🔢
TensorFlow
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning
Discovering Interpretable Multi-Parameter Control
Policies
for Evolutionary Algorithms Using
Deep
Reinforcement
Learning
🤖
Machine Learning
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Discovering Interpretable Multi-Parameter Control Policies for Evolutionary Algorithms Using Deep Reinforcement Learning
The Neutral Mask: How
RLHF
Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
💬
Natural Language Processing
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
Geometry-Aware
Reinforcement
Learning
for 2D Irregular Nesting
✨
Generative Art
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Geometry-Aware Reinforcement Learning for 2D Irregular Nesting
Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using
Reinforcement
Learning
🔢
TensorFlow
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning
Development of COVID-19 Booster Vaccine
Policy
by Microsimulation and
Q-learning
🤖
Machine Learning
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Development of COVID-19 Booster Vaccine Policy by Microsimulation and Q-learning
Self-Distilled
Policy
Gradient
🔢
TensorFlow
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Self-Distilled Policy Gradient
A Regret Minimization Framework on Preference
Learning
in Large Language Models
🤖
AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A Regret Minimization Framework on Preference Learning in Large Language Models
Event-Driven
Reinforcement
Learning
Enables Long-Horizon Control in Semiconductor Fabrication
🤖
Machine Learning
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication
Multilingual Sentiment Aware Text Summarization A
Reinforcement
Learning
Approach for Consistency Maintenance
💬
Natural Language Processing
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
On Advantage Estimates for Max@K
Policy
Gradients
🔢
TensorFlow
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for On Advantage Estimates for Max@K Policy Gradients
Representation-Aware Advantage Estimation: Your
Reward
Model Provides More Than A Scalar Output
🤖
AI
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help