Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🎯 Reinforcement Learning
Q-learning, Policy Gradient, Reward Functions, TD Learning
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
556
posts in
11.6
ms
Self-Play
Reinforcement
Learning
under Imperfect Information in Big 2
🎮
Game Theory
arxiv.org
·
3d
PPO
vs
SAC: 1-GPU Memory & Compute Cost Benchmark
🚀
Performance
tildalice.io
·
6d
The Challenges of Using
Reinforcement
Learning
for Controlling Industrial Energy Systems
💬
Prompt Engineering
arxiv.org
·
16h
Convergence of Two-Timescale
Markovian
Stochastic Approximations with Applications in
Reinforcement
Learning
📊
Dynamic Programming
arxiv.org
·
16h
CleanRL
vs
Stable Baselines3:
PPO
Training 2.3x Faster
⏪
Deoptimization
tildalice.io
·
5d
Zero Collapse: A Failure Mode of
Policy
Gradient
Methods in Discontinuous
Reward
Environments
⚓
Anchors
arxiv.org
·
16h
Survival
Reinforcement
Learning
: Toward Scalable Self-Supervised RL
⚓
Anchors
arxiv.org
·
16h
Feat2Go: Visual Feature-Grounded Value Estimation for Embodied
Reinforcement
Learning
📱
Edge AI
arxiv.org
·
16h
DeepSeekMath Meets Order Book: Group-Aware
Policy
Optimization for High-Frequency Directional Trading
⚙️
LMAX Architecture
arxiv.org
·
6d
Refined Analysis of Entropy-Regularized
Actor-Critic
📊
Dynamic Programming
arxiv.org
·
6d
Reinforcement
Learning
from Denoising Feedback
📊
Dynamic Programming
arxiv.org
·
6d
Efficient
On-policy
Visual-RL via Stochastic Decoupled
Policy
Gradient
🤖
TVM
arxiv.org
·
5d
ProRL: Effective
Reinforcement
Learning
for Proactive Recommendation via Rectified
Policy
Gradient Estimation
📊
HyperLogLog
arxiv.org
·
4d
Explicit
Critic
Guidance for Aligning Diffusion Models
⚓
Anchors
arxiv.org
·
4d
Robust Koopman Control Barrier Filters for Safe
Actor-Critic
Reinforcement
Learning
🤖
Robotics
arxiv.org
·
5d
Moment Matching
Q-Learning
📱
Edge AI
arxiv.org
·
3d
Global Convergence of Wasserstein
Policy
Gradient
for Entropy-Regularized
Reinforcement
Learning
📊
Optimization
arxiv.org
·
6d
Commit to the Bit: Reactive
Reinforcement
Learning
Done Right
🎲
Deterministic Simulation
arxiv.org
·
4d
When LLM
Reward
Design Fails: Diagnostic-Driven Refinement for Sparse Structured RL
⚡
Incremental Computation
arxiv.org
·
3d
Reinforcement
Learning
with Robust Rubric
Rewards
📊
Dynamic Programming
arxiv.org
·
3d
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help