Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
RL, RLHF, reward model, policy gradient
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
241
posts in
5.2
ms
TT-DAC-PS: Twin-Target Deterministic
Actor-Critic
with
Policy
Smoothing for Optimal Trade Execution
📈
Quantitative Finance
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution
Performance Variation in
Deep
Reinforcement
Learning
📉
Deep Learning
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Performance Variation in Deep Reinforcement Learning
Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary
Deep
Reinforcement
Learning
🧠
AI Research
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep Reinforcement Learning
Deep
Reinforcement
Learning
for Adaptive Power Allocation in ISAC Systems with Mobile Target
🔄
Transformers
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Deep Reinforcement Learning for Adaptive Power Allocation in ISAC Systems with Mobile Target
Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using
Reinforcement
Learning
📉
Deep Learning
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning
Reinforcement
Learning
Disrupts
Gradient-Based
Adversarial Optimization
📉
Deep Learning
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization
Uncertainty-Aware LLM-Guided
Policy
Shaping for
Sparse-Reward
Reinforcement
Learning
🔄
Transformers
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning
A Unifying Lens on
Reward
Uncertainty in
RLHF
⚙️
Model Training
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for A Unifying Lens on Reward Uncertainty in RLHF
Deterministic
Policy
Gradient
for
Learning
Equilibrium in Time-Inconsistent Control Problems
📈
Quantitative Finance
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems
Self-Paced Curriculum
Reinforcement
Learning
for Autonomous Superbike Racing in Simulation
🤖
AI Agents
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation
Representation
Learning
Enables Scalable Multitask
Deep
Reinforcement
Learning
📉
Deep Learning
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Representation Learning Enables Scalable Multitask Deep Reinforcement Learning
Generalization Hacking:
Models
Can Game
Reinforcement
Learning
by Preventing Behavioral Generalization
🔄
Transformers
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Generalization Hacking: Models Can Game Reinforcement Learning by Preventing Behavioral Generalization
Variational
Proximal
Policy
Optimization
📉
Deep Learning
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Variational Proximal Policy Optimization
UniIntervene: Agentic Intervention for Efficient Real-World
Reinforcement
Learning
🤖
AI Agents
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning
Structure-Conditioned
Actor-Critic
Branches for Quality-Diversity
Reinforcement
Learning
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning
On Advantage Estimates for Max@K
Policy
Gradients
📐
Scaling Laws
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for On Advantage Estimates for Max@K Policy Gradients
Harnessing Routing Foresight for Micro-step-level MoE load balancing in
RL
Post-training
⚙️
Model Training
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training
DriveReward: A Comprehensive Dataset and Generative Vision-Language
Reward
Model
for Autonomous Driving
🖥️
ML Systems
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving
Phi-Actor-Critic
: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria
🤖
AI Agents
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria
Retry
Policy
Gradients
in Continuous Action Spaces
📉
Deep Learning
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Retry Policy Gradients in Continuous Action Spaces
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help