Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
RL, RLHF, reward model, policy gradient
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
475
posts in
6.9
ms
Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization
Geometrically Averaged Hard Target Updates for Linear
Q-Learning
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Geometrically Averaged Hard Target Updates for Linear Q-Learning
Architecture-Aware
Reinforcement
Learning
Makes Sliding-Window Attention Competitive in Math Reasoning
⚙️
Model Training
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning
Discovering Interpretable Multi-Parameter Control
Policies
for Evolutionary Algorithms Using
Deep
Reinforcement
Learning
🔍
Interpretability
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Discovering Interpretable Multi-Parameter Control Policies for Evolutionary Algorithms Using Deep Reinforcement Learning
Uncertainty-Aware LLM-Guided
Policy
Shaping for
Sparse-Reward
Reinforcement
Learning
🔄
Transformers
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning
Variational
Proximal
Policy
Optimization
📉
Deep Learning
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Variational Proximal Policy Optimization
IAPO: Input Attribution-Aware
Policy
Optimization
for Tool Use in Small Multimodal Agents
🤖
AI Agents
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents
Representation
Learning
Enables Scalable Multitask
Deep
Reinforcement
Learning
📉
Deep Learning
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Representation Learning Enables Scalable Multitask Deep Reinforcement Learning
Structure-Conditioned
Actor-Critic
Branches for Quality-Diversity
Reinforcement
Learning
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning
Test-Time
Gradient
Guidance of Flow
Policies
in
Reinforcement
Learning
⚙️
Model Training
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning
Multi-agent rendezvous in fluid flows via
reinforcement
learning
🤖
AI Agents
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for Multi-agent rendezvous in fluid flows via reinforcement learning
On Advantage Estimates for Max@K
Policy
Gradients
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for On Advantage Estimates for Max@K Policy Gradients
Representation-Aware Advantage Estimation: Your
Reward
Model
Provides More Than A Scalar Output
⚙️
Model Training
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output
Improving Generalization and Data Efficiency with Diffusion in Offline Multi-agent
RL
🤖
AI Agents
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for Improving Generalization and Data Efficiency with Diffusion in Offline Multi-agent RL
UNIQ: Conformal Calibration for Adaptive Conservatism in Offline
Reinforcement
Learning
🧠
AI Research
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning
Retry
Policy
Gradients
in Continuous Action Spaces
📉
Deep Learning
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Retry Policy Gradients in Continuous Action Spaces
DriveReward: A Comprehensive Dataset and Generative Vision-Language
Reward
Model
for Autonomous Driving
🖥️
ML Systems
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving
SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via
Reinforcement
Learning
🔄
Transformers
Content type:
Academic
arxiv.org
·
13h
13 hours ago
Actions for SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning
Rethinking the Divergence Regularization in LLM
RL
💬
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Rethinking the Divergence Regularization in LLM RL
Transformer-Enhanced
Reinforcement
Learning
: Fundamentals and Applications in Communication
Networks
🔄
Transformers
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Transformer-Enhanced Reinforcement Learning: Fundamentals and Applications in Communication Networks
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help