Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
馃幃 Reinforcement Learning
Q-Learning, Policy Gradient, Reward Systems, Game AI, Robotics
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
145826
posts in
19.6
ms
Markov
Decision
Processes
: The Language of Reinforcement Learning
聽
馃敆
Markov Chains
medium.com
路
3d
Predictive
Representations
for Skill Transfer in Reinforcement Learning
聽
馃
Machine Learning
arxiv.org
路
2h
A
Quadratic-Critic
Reinforcement Learning Framework for Business Decision Systems
聽
馃敘
TensorFlow
levelup.gitconnected.com
路
6d
What is
reinforcement
learning
finetuning
聽
馃敘
TensorFlow
youtube.com
路
6d
路
Hacker News
Hierarchical
Reinforcement Learning with Augmented Step-Level
Transitions
for LLM Agents
聽
馃敆
Markov Chains
arxiv.org
路
1d
Enhancing
sample
efficiency in reinforcement-learning-based flow control: replacing the
critic
with an adaptive reduced-order model
聽
馃敘
TensorFlow
arxiv.org
路
1d
Reinforcement
Learning for LLM Post-Training: A
Survey
聽
馃敘
TensorFlow
arxiv.org
路
2h
Robots that learn to
evaluate
models of
collective
behavior
聽
馃敆
Markov Chains
arxiv.org
路
2h
Drift-Based
Policy Optimization:
Native
One-Step Policy Learning for Online Robot Control
聽
馃敘
TensorFlow
arxiv.org
路
2d
DROP:
Distributional
and Regular Optimism and
Pessimism
for Reinforcement Learning
聽
馃
Transformers
arxiv.org
路
2h
Anticipatory
Reinforcement Learning: From Generative Path-Laws to
Distributional
Value Functions
聽
馃敆
Markov Chains
arxiv.org
路
2d
Boosted
Distributional
Reinforcement Learning: Analysis and Healthcare Applications
聽
馃敘
TensorFlow
arxiv.org
路
2d
Model-Based
Reinforcement
Learning for Control under
Time-Varying
Dynamics
聽
馃敆
Markov Chains
arxiv.org
路
6d
Value
Mirror
Descent
for Reinforcement Learning
聽
馃敆
Markov Chains
arxiv.org
路
1d
Optimizing
Neurorobot
Policy under Limited
Demonstration
Data through Preference Regret
聽
馃敘
TensorFlow
arxiv.org
路
2d
Contextual
Intelligence The Next
Leap
for Reinforcement Learning
聽
馃敘
TensorFlow
arxiv.org
路
3d
Provable
Multi-Task Reinforcement Learning: A Representation Learning Framework with Low
Rank
Rewards
聽
馃敘
TensorFlow
arxiv.org
路
2d
A Multi-Agent
Reinforcement
Learning Framework for Public Health Decision Analysis
聽
馃敆
Markov Chains
arxiv.org
路
2d
Behavior-Constrained Reinforcement Learning with
Receding-Horizon
Credit
Assignment
for High-Performance Control
聽
馃
AI
arxiv.org
路
3d
MC-CPO
:
Mastery-Conditioned
Constrained Policy Optimization
聽
馃敘
TensorFlow
arxiv.org
路
2d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help