Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 475 posts in 6.9 ms

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

 📐Scaling Laws  Content type: Academic
arxiv.org·

Geometrically Averaged Hard Target Updates for Linear Q-Learning

 📐Scaling Laws  Content type: Academic
arxiv.org·

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

 ⚙️Model Training  Content type: Academic
arxiv.org·

Discovering Interpretable Multi-Parameter Control Policies for Evolutionary Algorithms Using Deep Reinforcement Learning

 🔍Interpretability  Content type: Academic
arxiv.org·

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

 🔄Transformers  Content type: Academic
arxiv.org·

Variational Proximal Policy Optimization

 📉Deep Learning  Content type: Academic
arxiv.org·

IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents

 🤖AI Agents  Content type: Academic
arxiv.org·

Representation Learning Enables Scalable Multitask Deep Reinforcement Learning

 📉Deep Learning  Content type: Academic
arxiv.org·

Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning

 📐Scaling Laws  Content type: Academic
arxiv.org·

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

 ⚙️Model Training  Content type: Academic
arxiv.org·

Multi-agent rendezvous in fluid flows via reinforcement learning

 🤖AI Agents  Content type: Academic
arxiv.org·

On Advantage Estimates for Max@K Policy Gradients

 📐Scaling Laws  Content type: Academic
arxiv.org·

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

 ⚙️Model Training  Content type: Academic
arxiv.org·

Improving Generalization and Data Efficiency with Diffusion in Offline Multi-agent RL

 🤖AI Agents  Content type: Academic
arxiv.org·

UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

 🧠AI Research  Content type: Academic
arxiv.org·

Retry Policy Gradients in Continuous Action Spaces

 📉Deep Learning  Content type: Academic
arxiv.org·

DriveReward: A Comprehensive Dataset and Generative Vision-Language Reward Model for Autonomous Driving

 🖥️ML Systems  Content type: Academic
arxiv.org·

SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning

 🔄Transformers  Content type: Academic
arxiv.org·

Rethinking the Divergence Regularization in LLM RL

 💬LLMs  Content type: Academic
arxiv.org·

Transformer-Enhanced Reinforcement Learning: Fundamentals and Applications in Communication Networks

 🔄Transformers  Content type: Academic
arxiv.org·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help