Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 254 posts in 18.0 ms

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

 🤖Robotics  Content type: Academic
arxiv.org·

Geometrically Averaged Hard Target Updates for Linear Q-Learning

 📐Optimization Theory  Content type: Academic
arxiv.org·

Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment

 🕵️LLM Agents  Content type: Academic
arxiv.org·

SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation

 🤖Robotics  Content type: Academic
arxiv.org·

From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments

 🎲Stochastic Processes  Content type: Academic
arxiv.org·

TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution

 🤖AI  Content type: Academic
arxiv.org·

Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation

 📐Semidefinite Programming  Content type: Academic
arxiv.org·

RUBAS: Rubric-Based Reinforcement Learning for Agent Safety

 🕵️LLM Agents  Content type: Academic
arxiv.org·

Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning

 🤖Robotics  Content type: Academic
arxiv.org·

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

 🧠LLM  Content type: Academic
arxiv.org·

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

 🤖AI  Content type: Academic
arxiv.org·

QnRL: Quantum-Native Reinforcement Learning

 📐Optimization Theory  Content type: Academic
arxiv.org·

Self-evolving LLM agents with in-distribution Optimization

 🕵️LLM Agents  Content type: Academic
arxiv.org·

An Agency-Transferring Model-Free Policy Enhancement Technique

 📐Semidefinite Programming  Content type: Academic
arxiv.org·

Geometry-Aware Reinforcement Learning for 2D Irregular Nesting

 🕵️LLM Agents  Content type: Academic
arxiv.org·

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

 🧠Machine Learning  Content type: Academic
arxiv.org·

3SPO: State-Score-Supervised Policy Optimization for LLM Agents

 🕵️LLM Agents  Content type: Academic
arxiv.org·

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

 🤖AI  Content type: Academic
arxiv.org·

AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning

 🕵️LLM Agents  Content type: Academic
arxiv.org·

Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning

 💡AI Reasoning  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help