Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 291 posts in 13.0 ms

Policy Gradient for Continuous-Time Robust Markov Decision Processes

 🤖LLM Inference  Content type: Academic
arxiv.org·

Can Reinforcement Learning Help LLMs Discover New Reasoning Strategies?

 🧠LLM
pub.towardsai.net
·

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

 🤖Game AI  Content type: Academic
arxiv.org·

Geometrically Averaged Hard Target Updates for Linear Q-Learning

 🤖LLM Inference  Content type: Academic
arxiv.org·

Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

 🤝Human-AI Collaboration  Content type: Academic
arxiv.org·

Self-Distilled Policy Gradient

 📡Information Theory  Content type: Academic
arxiv.org·

Reasoning or Memorization? Direction-Aware Diversity Exploration in LLM Reinforcement Learning

 🧠LLM  Content type: Academic
arxiv.org·

Offline Reinforcement Learning for Plasma Control in Nuclear Fusion: Codebase and Benchmark

 🤝Human-AI Collaboration  Content type: Academic
arxiv.org·

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

 🤖Game AI  Content type: Academic
arxiv.org·

HARBOR: A Harness Framework for Agentic Robot Reinforcement Learning

 🎯AI Agents  Content type: Academic
arxiv.org·

Self-Optimizing Control of Continuous Processes Based on Reinforcement Learning

 🤖Agentic AI  Content type: Academic
arxiv.org·

SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration

 🛡️AI Safety  Content type: Academic
arxiv.org·

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

 🧠LLM  Content type: Academic
arxiv.org·

Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation

 🔢Numerical Methods  Content type: Academic
arxiv.org·

On Advantage Estimates for Max@K Policy Gradients

 🧠LLM  Content type: Academic
arxiv.org·

Reinforcement Learning for Flow-Matching Policies with Density Transport

 🤖Game AI  Content type: Academic
arxiv.org·

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

 🧠LLM  Content type: Academic
arxiv.org·

Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning

 🤖Game AI  Content type: Academic
arxiv.org·

SocraticPO: Policy Optimization via Interactive Guidance

 🧠LLM  Content type: Academic
arxiv.org·

RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training

 🧠LLM  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help