Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
424
posts in
9.7
ms
Variational
Proximal
Policy
Optimization
💾
Agent Memory
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Variational Proximal Policy Optimization
Reasoning
RL
in 2026: GRPO, DPO, RLVR,
Agentic
PO
& Beyond
🧠
LLMs
turingpost.com
·
3d
3 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
Researchers develop AI-powered railway control system for efficient urban train operation
🤖
AI Agents
techxplore.com
·
8h
8 hours ago
Actions for Researchers develop AI-powered railway control system for efficient urban train operation
Q-Learning
(
Reinforcement
learning
): Bellman Equation, Markov Decision Processes, Q-Values, and…
♟️
Game Theory
Content type:
Blog
medium.com
·
1d
1 day ago
Actions for Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…
Reinforcement
Learning
and
Optimal
Control Book (RIP Dimitri Bertsekas)
♟️
Game Theory
Content type:
Academic
web.mit.edu
·
5d
5 days ago
·
Hacker News
Actions for Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)
Fast and Highly Expressive
Policy
Learning
for Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
🤖
AI Agents
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
Geometrically Averaged Hard Target Updates for Linear
Q-Learning
🧠
LLMs
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for Geometrically Averaged Hard Target Updates for Linear Q-Learning
The Neutral Mask: How
RLHF
Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
Representation-Aware Advantage Estimation: Your
Reward
Model Provides More Than A Scalar Output
🧠
LLMs
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output
Multilingual
Sentiment Aware Text Summarization A
Reinforcement
Learning
Approach for Consistency Maintenance
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
Policy
Gradient
for Continuous-Time Robust Markov Decision Processes
🧠
LLMs
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Policy Gradient for Continuous-Time Robust Markov Decision Processes
A Unifying Lens on
Reward
Uncertainty in
RLHF
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A Unifying Lens on Reward Uncertainty in RLHF
Discovering Interpretable
Multi-Parameter
Control
Policies
for Evolutionary Algorithms Using
Deep
Reinforcement Learning
🤖
AI Agents
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for Discovering Interpretable Multi-Parameter Control Policies for Evolutionary Algorithms Using Deep Reinforcement Learning
A Regret Minimization Framework on Preference
Learning
in Large Language Models
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A Regret Minimization Framework on Preference Learning in Large Language Models
Geometry-Aware
Reinforcement
Learning
for 2D Irregular Nesting
🤖
AI Agents
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for Geometry-Aware Reinforcement Learning for 2D Irregular Nesting
Self-Distilled
Policy
Gradient
📡
Information Theory
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Self-Distilled Policy Gradient
Development of COVID-19 Booster Vaccine
Policy
by Microsimulation and
Q-learning
♟️
Game Theory
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for Development of COVID-19 Booster Vaccine Policy by Microsimulation and Q-learning
UNIQ: Conformal Calibration for Adaptive Conservatism in Offline
Reinforcement
Learning
🤝
AI-Assisted Coding
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning
Sparse Mixture-of-Experts
Reward
Models
Learn
Interpretable and Specialized Experts for Personalized Preference Modeling
🧠
LLMs
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling
Flow-DPPO: Divergence
Proximal
Policy
Optimization
for Flow Matching Models
🤖
AI Agents
Content type:
Academic
arxiv.org
·
17h
17 hours ago
Actions for Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help