Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
馃幆 Reinforcement Learning
RL, reward, policy gradient, agent, Q-learning
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
431
posts in
7.3
ms
3SPO: State-Score-Supervised
Policy
Optimization
for LLM
Agents
聽
馃К
Evolutionary Computation
聽
Content type:
Academic
arxiv.org
路
14h
14 hours ago
Actions for 3SPO: State-Score-Supervised Policy Optimization for LLM Agents
Self-evolving LLM
agents
with in-distribution
Optimization
聽
馃К
Evolutionary Computation
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Self-evolving LLM agents with in-distribution Optimization
SHAPO: Sharpness-Aware
Policy
Optimization
for Safe Exploration
聽
馃
Active Inference
聽
Content type:
Academic
arxiv.org
路
14h
14 hours ago
Actions for SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
Policy
Gradient
for Continuous-Time Robust
Markov
Decision Processes
聽
鈿欙笍
Computational Mechanics
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Policy Gradient for Continuous-Time Robust Markov Decision Processes
GIFT: LLM-Guided
State-Reward
Interface for Financial
Reinforcement
Learning
聽
馃К
Evolutionary Computation
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for GIFT: LLM-Guided State-Reward Interface for Financial Reinforcement Learning
Dmsh: A
Multi-Agent
Reinforcement
Learning
Framework for All-Quad Mesh Generation
聽
鈿欙笍
Computational Mechanics
聽
Content type:
Academic
arxiv.org
路
14h
14 hours ago
Actions for Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation
Reinforcement
Learning
for Flow-Matching
Policies
with Density Transport
聽
馃
Active Inference
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Reinforcement Learning for Flow-Matching Policies with Density Transport
Beyond Uniform Token-Level Trust Region in LLM
Reinforcement
Learning
聽
鈿欙笍
Computational Mechanics
聽
Content type:
Academic
arxiv.org
路
14h
14 hours ago
Actions for Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning
Dynamic Multi-Pair Trading Strategy in Cryptocurrency Markets with
Deep
Reinforcement
Learning
聽
馃
Synaptic pruning
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Dynamic Multi-Pair Trading Strategy in Cryptocurrency Markets with Deep Reinforcement Learning
Self-Paced Curriculum
Reinforcement
Learning
for Autonomous Superbike Racing in Simulation
聽
馃寑
Open-Ended Learning
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation
From Ticks to Flows: Dynamics of Neural
Reinforcement
Learning
in Continuous
Environments
聽
馃攧
Continual Learning
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments
Deep
reinforcement
learning
for process design: Review and perspective
聽
馃攧
Continual Learning
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Deep reinforcement learning for process design: Review and perspective
Progress-SQL: Improving
Reinforcement
Learning
for Text-to-SQL via Progressive
Rewards
聽
馃攧
Continual Learning
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Progress-SQL: Improving Reinforcement Learning for Text-to-SQL via Progressive Rewards
Enhancing the MADDPG Algorithm for
Multi-Agent
Learning
via Action Inference and Importance Sampling
聽
馃
Active Inference
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Enhancing the MADDPG Algorithm for Multi-Agent Learning via Action Inference and Importance Sampling
HARBOR: A Harness Framework for
Agentic
Robot
Reinforcement
Learning
聽
馃
Developmental Robotics
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for HARBOR: A Harness Framework for Agentic Robot Reinforcement Learning
Drag reduction or
reward
hacking? Recurrent
multi-agent
reinforcement
learning that earns its
reward
聽
馃
Active Inference
聽
Content type:
Academic
arxiv.org
路
5d
5 days ago
Actions for Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward
Path Planning Using
Deep
Deterministic
Policy
Gradient
: A Reinforcement Learning Approach
聽
鈿欙笍
Computational Mechanics
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach
Rethinking the Divergence Regularization in LLM
RL
聽
馃
Active Inference
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Rethinking the Divergence Regularization in LLM RL
Merging model-based control with
multi-agent
reinforcement
learning
for
multi-agent
cooperative teaming strategies
聽
鈿欙笍
Computational Mechanics
聽
Content type:
Academic
arxiv.org
路
5d
5 days ago
Actions for Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies
Claw-R1: A Step-Level Data Middleware System for
Agentic
Reinforcement
Learning
聽
馃
Developmental Robotics
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning
« Page 1
路
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help