Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎯 Reinforcement Learning
Q-learning, Policy Gradient, Reward Functions, TD Learning
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
170773
posts in
31.4
ms
Predictive
Representations
for Skill Transfer in Reinforcement Learning
📊
Dynamic Programming
arxiv.org
·
5d
Introduction
to Reinforcement Learning Agents with the
Unity
Game Engine
📊
Dynamic Programming
towardsdatascience.com
·
3d
A
Comparative
Theoretical
Analysis of Entropy Control Methods in Reinforcement Learning
📊
Dynamic Programming
arxiv.org
·
15h
StructRL
: Recovering Dynamic Programming Structure from Learning Dynamics in
Distributional
Reinforcement Learning
📊
Dynamic Programming
arxiv.org
·
1d
Self-Distilled
Reinforcement Learning for Co-Evolving Agentic
Recommender
Systems
⚓
Anchors
arxiv.org
·
15h
From Reasoning to Agentic: Credit
Assignment
in
Reinforcement
Learning for Large Language Models
💬
Prompt Engineering
arxiv.org
·
1d
You Only Judge Once: Multi-response
Reward
Modeling
in a Single Forward Pass
💬
Prompt Engineering
arxiv.org
·
15h
Adaptive
Simulation
Experiment
for LLM Policy Optimization
💬
Prompt Engineering
arxiv.org
·
1d
Deep Learning for
Sequential
Decision Making under Uncertainty: Foundations, Frameworks, and
Frontiers
🔬
Deep Learning
arxiv.org
·
15h
Thompson Sampling for Infinite-Horizon
Discounted
Decision
Processes
📊
Dynamic Programming
arxiv.org
·
5d
Relax
: An Asynchronous Reinforcement Learning Engine for
Omni-Modal
Post-Training at Scale
🌊
Noria
arxiv.org
·
15h
Trust Your Memory:
Verifiable
Control of Smart Homes through Reinforcement Learning with Multi-dimensional
Rewards
🏠
Home Automation
arxiv.org
·
15h
Gaussian
Approximation
for
Asynchronous
Q-learning
📊
Optimization
arxiv.org
·
5d
Smart
Commander
: A Hierarchical Reinforcement Learning Framework for Fleet-Level
PHM
Decision Optimization
📊
Dynamic Programming
arxiv.org
·
5d
DROP:
Distributional
and Regular Optimism and
Pessimism
for Reinforcement Learning
📊
Optimization
arxiv.org
·
5d
Value
Mirror
Descent
for Reinforcement Learning
📊
Optimization
arxiv.org
·
6d
Vintix
II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement
Learner
🤖
Transformers
arxiv.org
·
6d
Enhancing
sample
efficiency in reinforcement-learning-based flow control: replacing the
critic
with an adaptive reduced-order model
🌀
Naiad
arxiv.org
·
6d
Target Policy Optimization
📊
Dynamic Programming
arxiv.org
·
6d
Offline
RL
for Adaptive Policy Retrieval in Prior
Authorization
🎲
Deterministic Simulation
arxiv.org
·
6d
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help