Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
馃幆 Reinforcement Learning
RL, reward learning, policy gradient, offline RL
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
74
posts in
5.8
ms
Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep
Reinforcement
Learning
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
7h
7 hours ago
Actions for Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep Reinforcement Learning
Fast and Highly Expressive
Policy
Learning
for
Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
Geometrically Averaged Hard Target Updates for Linear
Q-Learning
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Geometrically Averaged Hard Target Updates for Linear Q-Learning
TT-DAC-PS: Twin-Target Deterministic
Actor-Critic
with
Policy
Smoothing for Optimal Trade Execution
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution
Retry
Policy
Gradients
in Continuous Action Spaces
聽
馃
Robot Learning
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Retry Policy Gradients in Continuous Action Spaces
Offline
Reinforcement
Learning
for Plasma Control in Nuclear Fusion: Codebase and Benchmark
聽
馃
Robot Learning
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Offline Reinforcement Learning for Plasma Control in Nuclear Fusion: Codebase and Benchmark
Path Planning Using Deep Deterministic
Policy
Gradient
: A
Reinforcement
Learning Approach
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach
Self-evolving LLM
agents
with in-distribution
Optimization
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
3d
3 days ago
Actions for Self-evolving LLM agents with in-distribution Optimization
Semi-Offline
Reinforcement
Learning
for Optimized Text Generation
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Semi-Offline Reinforcement Learning for Optimized Text Generation
Failure Modes of Deep
Multi-Agent
RL
in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix
Development of COVID-19 Booster Vaccine
Policy
by Microsimulation and
Q-learning
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Development of COVID-19 Booster Vaccine Policy by Microsimulation and Q-learning
Merging model-based control with
multi-agent
reinforcement
learning
for
multi-agent
cooperative teaming strategies
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies
SHAPO: Sharpness-Aware
Policy
Optimization
for Safe
Exploration
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
Test-Time
Gradient
Guidance of Flow
Policies
in
Reinforcement
Learning
聽
馃
Robot Learning
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning
On Advantage Estimates for Max@K
Policy
Gradients
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for On Advantage Estimates for Max@K Policy Gradients
Learning
Predictive Control with Deep Koopman Operators for Autonomous Vehicle Motion Planning
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Learning Predictive Control with Deep Koopman Operators for Autonomous Vehicle Motion Planning
Drag reduction or
reward
hacking? Recurrent
multi-agent
reinforcement
learning that earns its
reward
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward
Dmsh: A
Multi-Agent
Reinforcement
Learning
Framework for All-Quad Mesh Generation
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation
Self-Paced Curriculum
Reinforcement
Learning
for Autonomous Superbike Racing in Simulation
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation
Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using
Reinforcement
Learning
聽
馃寪
World Models
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning
« Page 1
路
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help