Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
馃幃 Reinforcement Learning
RL, Agents, Policy Optimization, Reward Functions
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
477
posts in
7.4
ms
Deep
reinforcement
learning
for process design: Review and perspective
聽
馃
Machine Learning
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Deep reinforcement learning for process design: Review and perspective
Beyond Uniform Token-Level Trust Region in LLM
Reinforcement
Learning
聽
馃挰
LLM
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning
SocraticPO:
Policy
Optimization
via Interactive Guidance
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for SocraticPO: Policy Optimization via Interactive Guidance
Cooperative Long Rope Skipping via
Multi-Agent
Reinforcement
Learning
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning
Failure Modes of
Deep
Multi-Agent
RL
in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix
聽
馃挰
LLM
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix
Path Planning Using
Deep
Deterministic
Policy
Gradient
: A Reinforcement Learning Approach
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach
Mult-DPO
: Multinomial Direct Preference
Optimization
for Recommender Systems
聽
馃挰
LLM
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems
Self-evolving LLM
agents
with in-distribution
Optimization
聽
馃挰
LLM
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Self-evolving LLM agents with in-distribution Optimization
MODIP: Efficient Model-Based
Optimization
for Diffusion
Policies
聽
馃
Machine Learning
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for MODIP: Efficient Model-Based Optimization for Diffusion Policies
An
Agency-Transferring
Model-Free
Policy
Enhancement Technique
聽
馃
Machine Learning
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for An Agency-Transferring Model-Free Policy Enhancement Technique
Policy
Gradient
for Continuous-Time Robust
Markov
Decision Processes
聽
馃挰
LLM
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Policy Gradient for Continuous-Time Robust Markov Decision Processes
Mitigating Bias in Low-SNR Financial
Reinforcement
Learning
via Quantum Representations
聽
馃
Machine Learning
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for Mitigating Bias in Low-SNR Financial Reinforcement Learning via Quantum Representations
The Neutral Mask: How
RLHF
Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
聽
馃挰
LLM
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
Enhancing the MADDPG Algorithm for
Multi-Agent
Learning
via Action Inference and Importance Sampling
聽
馃挰
LLM
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Enhancing the MADDPG Algorithm for Multi-Agent Learning via Action Inference and Importance Sampling
Self-Evolving Scientific
Agent
Discovers Generalizable Physically-Reasoned Fluid Control
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Self-Evolving Scientific Agent Discovers Generalizable Physically-Reasoned Fluid Control
SARM2: Multi-Task Stage Aware
Reward
Modeling for Self Improving Robotic Manipulation
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation
Learning
Predictive Control with
Deep
Koopman Operators for Autonomous Vehicle Motion Planning
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Learning Predictive Control with Deep Koopman Operators for Autonomous Vehicle Motion Planning
TRACE: A Unified Rollout Budget Allocation Framework for Efficient
Agentic
Reinforcement
Learning
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning
Merging model-based control with
multi-agent
reinforcement
learning
for
multi-agent
cooperative teaming strategies
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
5d
5 days ago
Actions for Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies
Variational Proximal
Policy
Optimization
聽
馃
Machine Learning
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Variational Proximal Policy Optimization
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help