Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🎮 Reinforcement Learning
Q-Learning, Policy Gradient, Reward Systems, Game AI, Robotics
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
203992
posts in
63.7
ms
Button‑pushing
explorers
: How to
grasp
that AI agents can do amazing things while knowing nothing
🔍
AI Interpretability
techxplore.com
·
3d
AIS
: Adaptive Importance Sampling for
Quantized
RL
🔍
AI Interpretability
arxiv.org
·
1d
Self-Supervised On-Policy Reinforcement Learning via Contrastive
Proximal
Policy
Optimisation
⚙
Context engineering
arxiv.org
·
2d
Parallelizing
Counterfactual Regret
Minimization
⚙
Context engineering
arxiv.org
·
1d
Reward-Conditioned
Reinforcement Learning
⚙
Context engineering
arxiv.org
·
4d
Action-Conditioned
Risk
Gating
for Safety-Critical Control under Partial Observability
⚙
Context engineering
arxiv.org
·
1d
GAGPO
: Generalized Advantage
Grouped
Policy Optimization
⚙
Context engineering
arxiv.org
·
2d
Skill-R1
: Agent
Skill
Evolution via Reinforcement Learning
⚙
Context engineering
arxiv.org
·
4d
Critic-Driven
Voronoi-Quantization
for
Distilling
Deep RL Policies to Explainable Models
⚙
Context engineering
arxiv.org
·
1d
Learning from Failures:
Correction-Oriented
Policy Optimization with
Verifiable
Rewards
⚙
Context engineering
arxiv.org
·
1d
Improved Model-based Reinforcement Learning with
Smooth
Kernels
⚙
Context engineering
arxiv.org
·
5d
Second-Order Actor-Critic Methods for Discounted
MDPs
via Policy
Hessian
Decomposition
⚙
Context engineering
arxiv.org
·
1d
A
Switching
System Theory of Q-Learning with Linear Function
Approximation
🤝
Multi-Agent Systems
arxiv.org
·
3d
Ergodic
Imitation for Adaptive Exploration around
Demonstrations
⚙
Context engineering
arxiv.org
·
1d
Submodular
Multi-Agent Policy Learning for Online Distributed Task
Allocation
in Open Multi-Agent Systems
🤝
Multi-Agent Systems
arxiv.org
·
2d
Your Language Model is Its Own
Critic
: Reinforcement Learning with Value Estimation from Actor's
Internal
States
⚙
Context engineering
arxiv.org
·
5d
Driving
Intents
Amplify
Planning-Oriented Reinforcement Learning
⚙
Context engineering
arxiv.org
·
2d
Reinforcement
Learning
Measurement
Model
⚙
Context engineering
arxiv.org
·
4d
Learning to Build the Environment: Self-Evolving Reasoning
RL
via
Verifiable
Environment Synthesis
⚙
Context engineering
arxiv.org
·
1d
MetaAgent-X
: Breaking the
Ceiling
of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning
🤝
Multi-Agent Systems
arxiv.org
·
1d
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help