Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🎮 Reinforcement Learning
Q-Learning, Policy Gradient, Reward Systems, Game AI, Robotics
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
204058
posts in
42.0
ms
Policy
Gradient
Methods for
Non-Markovian
Reinforcement Learning
⚙
Context engineering
arxiv.org
·
4d
Reinforcement
Learning: An
Introduction
(2nd Edition)
🔍
AI Interpretability
chizkidd.github.io
·
13h
rl
for red
teaming
: training models to attack and defend themselves
⚙
Context engineering
castform.com
·
1d
·
Hacker News
Reinforcement
Learning, Agency and
Taste
⚙
Context engineering
lesswrong.com
·
3d
'Try,
Score
, Change':
Reinforcement
Learning for Children
⚙
Context engineering
gwern.net
·
6d
·
Hacker News
What is the difference between
supervised
,
unsupervised
, and reinforcement learning?
⚙
Context engineering
medium.com
·
4d
Button-pushing
explorers
: How to
grasp
that AI agents can do amazing things while knowing nothing
🔍
AI Interpretability
theconversation.com
·
3d
Learning, Fast and Slow: LLMs That
Adapt
Continually
⚙
Context engineering
gepa-ai.github.io
·
5d
·
Hacker News
Q-Flow:
Stable
and
Expressive
Reinforcement Learning with Flow-Based Policy
⚙
Context engineering
arxiv.org
·
2d
Reinforcement
Learning for
Optimal
Execution
🤖
agents
jonathankinlay.com
·
5d
Meta's
Hyperagents
and
Self-Correcting
Agents
⚙
Context engineering
jdsemrau.substack.com
·
4d
·
Substack
Self-Distilled
Agentic
Reinforcement
Learning
⚙
Context engineering
arxiv.org
·
1d
Multi-Objective
and Mixed-Reward Reinforcement Learning via
Reward-Decorrelated
Policy Optimization
⚙
Context engineering
arxiv.org
·
2d
Policy Optimization in Hybrid
Discrete-Continuous
Action Spaces via Mixed
Gradients
🤝
Multi-Agent Systems
arxiv.org
·
1d
Boosting Reinforcement Learning with
Verifiable
Rewards via
Randomly
Selected Few-Shot Guidance
⚙
Context engineering
arxiv.org
·
1d
3D
RL-DWA
: A Hybrid Reinforcement Learning and Dynamic Window Approach for Goal-Directed Local Navigation in
Multi-DoF
Robots
🤝
Multi-Agent Systems
arxiv.org
·
2d
Synthesizing
POMDP
Policies: Sampling Meets Model-checking via Learning
⚙
Context engineering
arxiv.org
·
1d
Revisiting Reinforcement Learning with
Verifiable
Rewards from a
Contrastive
Perspective
⚙
Context engineering
arxiv.org
·
2d
ROAD: Adaptive Data
Mixing
for
Offline-to-Online
Reinforcement Learning via Bi-Level Optimization
⚙
Context engineering
arxiv.org
·
1d
Learning When to Act:
Communication-Efficient
Reinforcement Learning via Run-Time
Assurance
🧪
Property-based Testing
arxiv.org
·
2d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help