Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 74 posts in 6.7 ms

Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

馃寪World ModelsContent type: Academic
arxiv.org

Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning

馃寪World ModelsContent type: Academic
arxiv.org

Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents

馃寪World ModelsContent type: Academic
arxiv.org

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

馃寪World ModelsContent type: Academic
arxiv.org

A Barrier-Modulated Architecture for Safe Affine Formation Control in Second-Order Multi-Agent Systems

鈾燂笍Game TheoryContent type: Academic
arxiv.org

Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models

馃寪World ModelsContent type: Academic
arxiv.org

SAW: Stage-Aware Dynamic Weighting for Multi-Objective Reinforcement Learning in Large Language Models

馃寪World ModelsContent type: Academic
arxiv.org

Learning to replenish: A hybrid deep reinforcement learning for dynamic inventory management in the pharmaceutical supply chains

馃寪World ModelsContent type: Academic
arxiv.org

Belief-Space Quantum-Inspired Reinforcement Learning for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles

馃寪World ModelsContent type: Academic
arxiv.org

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

馃寪World ModelsContent type: Academic
arxiv.org

ARTA: Adaptive Reinforcement-Learning-Based Throttling Agent for RowHammer Vulnerabilities

馃寪World ModelsContent type: Academic
arxiv.org

Representation Learning Enables Scalable Multitask Deep Reinforcement Learning

馃寪World ModelsContent type: Academic
arxiv.org

Mitigating Bias in Low-SNR Financial Reinforcement Learning via Quantum Representations

馃寪World ModelsContent type: Academic
arxiv.org

Discovering Interpretable Multi-Parameter Control Policies for Evolutionary Algorithms Using Deep Reinforcement Learning

馃寪World ModelsContent type: Academic
arxiv.org

Constrained Deep Reinforcement Learning for Cognitive Radar Resource Management

馃寪World ModelsContent type: Academic
arxiv.org

Alpha-RTL: Test-Time Training for RTL Hardware Optimization

馃寪World ModelsContent type: Academic
arxiv.org

SALT: When More Rollouts Don't Help in Group-Based Policy Optimization and How to Make Them Matter

馃寪World ModelsContent type: Academic
arxiv.org

The Impact of Market Informedness on Market Makers' Profitability

馃寪World ModelsContent type: Academic
arxiv.org

SocraticPO: Policy Optimization via Interactive Guidance

馃寪World ModelsContent type: Academic
arxiv.org

Q-VGM: Q-Guided Value-Gradient Matching for Flow-Matching VLA Policies

馃搫AI ResearchContent type: Academic
arxiv.org
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help