Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
RL, reward functions, policy gradient, agents, simulation
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
291
posts in
13.0
ms
Policy
Gradient
for Continuous-Time Robust
Markov
Decision Processes
🤖
LLM Inference
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Policy Gradient for Continuous-Time Robust Markov Decision Processes
Can
Reinforcement
Learning
Help LLMs Discover New Reasoning Strategies?
🧠
LLM
pub.towardsai.net
·
1d
1 day ago
Actions for Can Reinforcement Learning Help LLMs Discover New Reasoning Strategies?
Fast and Highly Expressive
Policy
Learning
for Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
🤖
Game AI
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
Geometrically Averaged Hard Target Updates for Linear
Q-Learning
🤖
LLM Inference
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Geometrically Averaged Hard Target Updates for Linear Q-Learning
Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using
Reinforcement
Learning
🤝
Human-AI Collaboration
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning
Self-Distilled
Policy
Gradient
📡
Information Theory
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Self-Distilled Policy Gradient
Reasoning or Memorization? Direction-Aware Diversity
Exploration
in LLM
Reinforcement
Learning
🧠
LLM
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Reasoning or Memorization? Direction-Aware Diversity Exploration in LLM Reinforcement Learning
Offline
Reinforcement
Learning
for Plasma Control in Nuclear Fusion: Codebase and Benchmark
🤝
Human-AI Collaboration
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Offline Reinforcement Learning for Plasma Control in Nuclear Fusion: Codebase and Benchmark
Test-Time
Gradient
Guidance of Flow
Policies
in
Reinforcement
Learning
🤖
Game AI
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning
HARBOR: A Harness Framework for
Agentic
Robot
Reinforcement
Learning
🎯
AI Agents
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for HARBOR: A Harness Framework for Agentic Robot Reinforcement Learning
Self-Optimizing Control of Continuous
Processes
Based
on
Reinforcement
Learning
🤖
Agentic AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Self-Optimizing Control of Continuous Processes Based on Reinforcement Learning
SHAPO: Sharpness-Aware
Policy
Optimization for Safe
Exploration
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
Uncertainty-Aware LLM-Guided
Policy
Shaping for
Sparse-Reward
Reinforcement
Learning
🧠
LLM
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning
Dmsh: A
Multi-Agent
Reinforcement
Learning
Framework for All-Quad Mesh Generation
🔢
Numerical Methods
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation
On Advantage Estimates for Max@K
Policy
Gradients
🧠
LLM
Content type:
Academic
arxiv.org
·
5d
5 days ago
Actions for On Advantage Estimates for Max@K Policy Gradients
Reinforcement
Learning
for Flow-Matching
Policies
with Density Transport
🤖
Game AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Reinforcement Learning for Flow-Matching Policies with Density Transport
When
RL
Fails after SFT: Rejuvenating Model Plasticity for Robust
SFT-to-RL
Handoff
🧠
LLM
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff
Structure-Conditioned
Actor-Critic
Branches for Quality-Diversity
Reinforcement
Learning
🤖
Game AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning
SocraticPO:
Policy
Optimization via Interactive Guidance
🧠
LLM
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for SocraticPO: Policy Optimization via Interactive Guidance
RL
Excursions during Pre-Training:
Re-examining
Policy
Optimization for LLM training
🧠
LLM
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help