Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 247 posts in 15.5 ms

Geometry-Aware Reinforcement Learning for 2D Irregular Nesting

 🕵️LLM Agents  Content type: Academic
arxiv.org·

Reinforcement Learning for Flow-Matching Policies with Density Transport

 🤖AI  Content type: Academic
arxiv.org·

3SPO: State-Score-Supervised Policy Optimization for LLM Agents

 🕵️LLM Agents  Content type: Academic
arxiv.org·

Dynamic Multi-Pair Trading Strategy in Cryptocurrency Markets with Deep Reinforcement Learning

 🔥PyTorch  Content type: Academic
arxiv.org·

TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution

 🤖AI  Content type: Academic
arxiv.org·

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

 🤖AI  Content type: Academic
arxiv.org·

Self-evolving LLM agents with in-distribution Optimization

 🕵️LLM Agents  Content type: Academic
arxiv.org·

Enhancing the MADDPG Algorithm for Multi-Agent Learning via Action Inference and Importance Sampling

 🕵️LLM Agents  Content type: Academic
arxiv.org·

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

 💬LLMs  Content type: Academic
arxiv.org·

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

 🧠Machine Learning  Content type: Academic
arxiv.org·

SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration

 📉Loss Landscapes  Content type: Academic
arxiv.org·

Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning

 🤖Robotics  Content type: Academic
arxiv.org·

Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment

 🕵️LLM Agents  Content type: Academic
arxiv.org·

Mitigating Bias in Low-SNR Financial Reinforcement Learning via Quantum Representations

 📶Communications  Content type: Academic
arxiv.org·

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

 🧠LLM  Content type: Academic
arxiv.org·

SocraticPO: Policy Optimization via Interactive Guidance

 🕵️LLM Agents  Content type: Academic
arxiv.org·

From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments

 🎲Stochastic Processes  Content type: Academic
arxiv.org·

QnRL: Quantum-Native Reinforcement Learning

 📐Optimization Theory  Content type: Academic
arxiv.org·

MODIP: Efficient Model-Based Optimization for Diffusion Policies

 📐Semidefinite Programming  Content type: Academic
arxiv.org·

An Agency-Transferring Model-Free Policy Enhancement Technique

 📐Semidefinite Programming  Content type: Academic
arxiv.org·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help