Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 107 posts in 6.8 ms

Mechanistic Analysis of Alignment Algorithms in Language Models

馃攧TransformersContent type: Academic
arxiv.org

On Advantage Estimates for Max@K Policy Gradients

馃aiContent type: Academic
arxiv.org

Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation

馃aiContent type: Academic
arxiv.org

SocraticPO: Policy Optimization via Interactive Guidance

馃aiContent type: Academic
arxiv.org

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

馃aiContent type: Academic
arxiv.org

Retry Policy Gradients in Continuous Action Spaces

馃aiContent type: Academic
arxiv.org

Quantum-Inspired Reinforcement Learning for Low-Latency Intrusion Detection in V2X and Internet-of-Vehicles Networks

馃aiContent type: Academic
arxiv.org

Reformulate LLM Reinforcement Learning for Efficient Training under Black-box Discrepancy

馃攧TransformersContent type: Academic
arxiv.org

From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments

馃aiContent type: Academic
arxiv.org

Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies

馃aiContent type: Academic
arxiv.org

Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO

馃攧TransformersContent type: Academic
arxiv.org

One Lens, Many Worlds : A Capability-Typed Interface for World-Model Interpretability

馃攧TransformersContent type: Academic
arxiv.org

Rethinking the Divergence Regularization in LLM RL

馃aiContent type: Academic
arxiv.org

Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward

馃aiContent type: Academic
arxiv.org

COP-Q: Safety-First Reinforcement Learning for Robot Control via Cholesky-Ordered Projection

馃aiContent type: Academic
arxiv.org

LogNEO: A GPT-Neo Reinforcement Learning Framework for Accurate Real-Time Log Anomaly Detection

馃aiContent type: Academic
arxiv.org

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

馃aiContent type: Academic
arxiv.org

Learn to Match: Two-Sided Matching with Temporally Extended Feedback

馃攧TransformersContent type: Academic
arxiv.org

Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models

馃攧TransformersContent type: Academic
arxiv.org

Q-VGM: Q-Guided Value-Gradient Matching for Flow-Matching VLA Policies

馃Machine LearningContent type: Academic
arxiv.org
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help