Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 392 posts in 7.2 ms

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

 🎯RLHF  Content type: Academic
arxiv.org·

Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach

 🤖Game AI  Content type: Academic
arxiv.org·

Self-Distilled Policy Gradient

 🎯RLHF  Content type: Academic
arxiv.org·

3SPO: State-Score-Supervised Policy Optimization for LLM Agents

 🎯AI Agents  Content type: Academic
arxiv.org·

An Agency-Transferring Model-Free Policy Enhancement Technique

 🤖AI  Content type: Academic
arxiv.org·

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

 🎯RLHF  Content type: Academic
arxiv.org·

Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning

 🎯RLHF  Content type: Academic
arxiv.org·

GARL: Game-Theoretic Reinforcement Learning for Multi-Agent Strategic Prioritisation

 🎯AI Agents  Content type: Academic
arxiv.org·

Geometry-Aware Reinforcement Learning for 2D Irregular Nesting

 👁️Computer Vision  Content type: Academic
arxiv.org·

Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning

 🎯RLHF  Content type: Academic
arxiv.org·

On Advantage Estimates for Max@K Policy Gradients

 🎯RLHF  Content type: Academic
arxiv.org·

Q-VGM: Q-Guided Value-Gradient Matching for Flow-Matching VLA Policies

 🤖AI  Content type: Academic
arxiv.org·

Enhancing the MADDPG Algorithm for Multi-Agent Learning via Action Inference and Importance Sampling

 📈Time Series Analysis  Content type: Academic
arxiv.org·

Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies

 🎯AI Agents  Content type: Academic
arxiv.org·

Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents

 ⚛️Physics  Content type: Academic
arxiv.org·

RUBAS: Rubric-Based Reinforcement Learning for Agent Safety

 🔐Cryptography  Content type: Academic
arxiv.org·

Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment

 📈Optimization  Content type: Academic
arxiv.org·

Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents

 🎲Probability  Content type: Academic
arxiv.org·

Alpha-RTL: Test-Time Training for RTL Hardware Optimization

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

Reinforcement Learning from Rich Feedback with Distributional DAgger

 🎯RLHF  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help