Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 407 posts in 7.5 ms

Reinforcement Learning for Flow-Matching Policies with Density Transport

 🤖Machine Learning  Content type: Academic
arxiv.org·

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

 🤖Transformers  Content type: Academic
arxiv.org·

Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication

 👁️Attention Mechanisms  Content type: Academic
arxiv.org·

Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning

 🤖AI  Content type: Academic
arxiv.org·

MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following

 ⚙️Systems Programming  Content type: Academic
arxiv.org·

Self-Evolving Scientific Agent Discovers Generalizable Physically-Reasoned Fluid Control

 📚Compilers  Content type: Academic
arxiv.org·

Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents

 🤖AI  Content type: Academic
arxiv.org·

3SPO: State-Score-Supervised Policy Optimization for LLM Agents

 🤖AI  Content type: Academic
arxiv.org·

An Agency-Transferring Model-Free Policy Enhancement Technique

 🤖Machine Learning  Content type: Academic
arxiv.org·

Learning to replenish: A hybrid deep reinforcement learning for dynamic inventory management in the pharmaceutical supply chains

 🤖Machine Learning  Content type: Academic
arxiv.org·

Geometry-Aware Reinforcement Learning for 2D Irregular Nesting

 SIMD Optimization  Content type: Academic
arxiv.org·

Distilling LLM Reasoning into an Interpretable Policy Tree for Human-AI Collaboration

 🤖Transformers  Content type: Academic
arxiv.org·

Alpha-RTL: Test-Time Training for RTL Hardware Optimization

 ⚙️JIT Compilation  Content type: Academic
arxiv.org·

GIFT: LLM-Guided State-Reward Interface for Financial Reinforcement Learning

 🤖AI  Content type: Academic
arxiv.org·

GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios

 🤖AI  Content type: Academic
arxiv.org·

Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix

 🤖AI  Content type: Academic
arxiv.org·

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

 🤖Transformers  Content type: Academic
arxiv.org·

Learning Predictive Control with Deep Koopman Operators for Autonomous Vehicle Motion Planning

 🤖Robotics  Content type: Academic
arxiv.org·

Learning Multi-Agent Communication Protocol: Study on Information Entropy Efficiency in MARL

 🤖AI  Content type: Academic
arxiv.org·

Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning

 🤖AI  Content type: Academic
arxiv.org·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help