Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
馃幃 Reinforcement Learning
Q-Learning, Policy Gradient, Reward Systems, Game AI, Robotics
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
419
posts in
6.3
ms
Deep
reinforcement
learning
for process design: Review and perspective
聽
馃敘
TensorFlow
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Deep reinforcement learning for process design: Review and perspective
COP-Q: Safety-First
Reinforcement
Learning
for
Robot
Control via Cholesky-Ordered Projection
聽
馃
Machine Learning
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for COP-Q: Safety-First Reinforcement Learning for Robot Control via Cholesky-Ordered Projection
A Unifying Lens on
Reward
Uncertainty in
RLHF
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for A Unifying Lens on Reward Uncertainty in RLHF
Path Planning Using
Deep
Deterministic
Policy
Gradient
: A Reinforcement Learning Approach
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach
Exact Unlearning in
Reinforcement
Learning
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Exact Unlearning in Reinforcement Learning
Uncertainty-Aware LLM-Guided
Policy
Shaping for
Sparse-Reward
Reinforcement
Learning
聽
馃敘
TensorFlow
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning
Progress-SQL: Improving
Reinforcement
Learning
for Text-to-SQL via Progressive
Rewards
聽
馃敘
TensorFlow
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Progress-SQL: Improving Reinforcement Learning for Text-to-SQL via Progressive Rewards
Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient
Reinforcement
Learning
of Language Models
聽
馃
Transformers
聽
Content type:
Academic
arxiv.org
路
5d
5 days ago
Actions for Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models
QnRL: Quantum-Native
Reinforcement
Learning
聽
馃摲
Photography
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for QnRL: Quantum-Native Reinforcement Learning
Fog of Love: Engineering Virtuous Agent Behavior with
Affinity-based
Reinforcement
Learning
in a Game Environment
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment
Performance Variation in
Deep
Reinforcement
Learning
聽
馃敘
TensorFlow
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Performance Variation in Deep Reinforcement Learning
Drag reduction or
reward
hacking? Recurrent multi-agent
reinforcement
learning
that earns its
reward
聽
馃
Transformers
聽
Content type:
Academic
arxiv.org
路
5d
5 days ago
Actions for Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward
EvalStop: Using World Feedback to Detect and Correct
Reward
Overoptimization in Multi-Tenant
RLHF
Platforms
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms
Explainably Safe
Reinforcement
Learning
聽
馃敘
TensorFlow
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Explainably Safe Reinforcement Learning
GARL:
Game-Theoretic
Reinforcement
Learning
for Multi-Agent Strategic Prioritisation
聽
馃敆
Markov Chains
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for GARL: Game-Theoretic Reinforcement Learning for Multi-Agent Strategic Prioritisation
Self-Optimizing
Control of Continuous Processes
Based
on
Reinforcement
Learning
聽
馃
Machine Learning
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Self-Optimizing Control of Continuous Processes Based on Reinforcement Learning
Enhancing the MADDPG Algorithm for Multi-Agent
Learning
via Action Inference and Importance Sampling
聽
馃敘
TensorFlow
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Enhancing the MADDPG Algorithm for Multi-Agent Learning via Action Inference and Importance Sampling
Merging
model-based
control with multi-agent
reinforcement
learning
for multi-agent cooperative teaming strategies
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
5d
5 days ago
Actions for Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies
RUBAS:
Rubric-Based
Reinforcement
Learning
for Agent Safety
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for RUBAS: Rubric-Based Reinforcement Learning for Agent Safety
Reinforcement
Learning
from Rich Feedback with Distributional DAgger
聽
馃敘
TensorFlow
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Reinforcement Learning from Rich Feedback with Distributional DAgger
« Page 1
路
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help