Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 241 posts in 11.1 ms

Neuro-Symbolic Injection of LTLf Constraints in Autoregressive Reinforcement Learning Policies

馃挰LLMsContent type: Academic
arxiv.org

Multi-agent rendezvous in fluid flows via reinforcement learning

馃AI AgentsContent type: Academic
arxiv.org

Adaptive Loss Balancing for Noise-Robust GRPO in Generative Recommendation

馃攧TransformersContent type: Academic
arxiv.org

Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies

馃AI AgentsContent type: Academic
arxiv.org

Improving Generalization and Data Efficiency with Diffusion in Offline Multi-agent RL

馃AI AgentsContent type: Academic
arxiv.org

ConSteer-RL: Steering Reasoning Capabilities in Large Language Models via Confidence-Aware Reinforcement Learning

馃挰LLMsContent type: Academic
arxiv.org

Deep reinforcement learning for process design: Review and perspective

馃搲Deep LearningContent type: Academic
arxiv.org

EEGDancer: Dynamic Emotion Latent Space Masked Modeling with Reinforcement Learning for EEG Continuous Emotion Prediction

馃AI ResearchContent type: Academic
arxiv.org

SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning

馃攧TransformersContent type: Academic
arxiv.org

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

馃挰LLMsContent type: Academic
arxiv.org

KinematicRL: A Sim-to-Real Reinforcement Learning Framework For Social Navigation With Kinodynamic Feasibility

馃AI AgentsContent type: Academic
arxiv.org

Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models

馃攧TransformersContent type: Academic
arxiv.org

Safe-RULE: Safe Reinforcement UnLEarning

馃搲Deep LearningContent type: Academic
arxiv.org

RLCSD: Reinforcement Learning with Contrastive On-Policy Self-Distillation

馃搲Deep LearningContent type: Academic
arxiv.org

Belief-Space Quantum-Inspired Reinforcement Learning for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles

馃搻Scaling LawsContent type: Academic
arxiv.org

StainFlow: Entity-Stain Tracking and Evidence Linking for Process Rewards in GUI Agents

馃AI AgentsContent type: Academic
arxiv.org

RoboNaldo: Accurate, Stable and Powerful Humanoid Soccer Shooting via Motion-Guided Curriculum Reinforcement Learning

馃攧TransformersContent type: Academic
arxiv.org

DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity

馃攳InterpretabilityContent type: Academic
arxiv.org

ProcessThinker: Enhancing Multi-modal Large Language Models Reasoning via Rollout-based Process Reward

馃AI ResearchContent type: Academic
arxiv.org

SALT: When More Rollouts Don't Help in Group-Based Policy Optimization and How to Make Them Matter

馃搻Scaling LawsContent type: Academic
arxiv.org

No more posts from Bingran's subscribed feeds.

Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help