Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 391 posts in 6.5 ms

On Advantage Estimates for Max@K Policy Gradients

馃AIContent type: Academic
arxiv.org

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

馃Transformer ArchitectureContent type: Academic
arxiv.org

Development of COVID-19 Booster Vaccine Policy by Microsimulation and Q-learning

馃Neural Network ArchitecturesContent type: Academic
arxiv.org

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

馃攧LSTM NetworksContent type: Academic
arxiv.org

Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models

馃敭MLContent type: Academic
arxiv.org

Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication

馃攧LSTM NetworksContent type: Academic
arxiv.org

PAWS: Preference Learning with Advantage-Weighted Segments

馃Transformer ArchitectureContent type: Academic
arxiv.org

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

馃搱Time Series ForecastingContent type: Academic
arxiv.org

Improving Generalization and Data Efficiency with Diffusion in Offline Multi-agent RL

馃幉Synthetic Data GenerationContent type: Academic
arxiv.org

MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following

馃Transformer ArchitectureContent type: Academic
arxiv.org

Bellman-Taylor Score Decoding for Markov Decision Processes with State-Dependent Feasible Action Sets

馃攧LSTM NetworksContent type: Academic
arxiv.org

Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach

馃Neural Network ArchitecturesContent type: Academic
arxiv.org

World Model Self-Distillation: Training World Models to Solve General Tasks

馃幉Synthetic Data GenerationContent type: Academic
arxiv.org

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

馃Transformer ArchitectureContent type: Academic
arxiv.org

Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation

馃Transformer ArchitectureContent type: Academic
arxiv.org

APPO: Agentic Procedural Policy Optimization

馃AIContent type: Academic
arxiv.org

Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward

馃攧LSTM NetworksContent type: Academic
arxiv.org

QnRL: Quantum-Native Reinforcement Learning

馃敭MLContent type: Academic
arxiv.org

IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents

馃Transformer ArchitectureContent type: Academic
arxiv.org

Belief-Space Quantum-Inspired Reinforcement Learning for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles

馃殌Model DeploymentContent type: Academic
arxiv.org
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help