Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 241 posts in 6.3 ms

Rethinking the Divergence Regularization in LLM RL

馃挰LLMsContent type: Academic
arxiv.org

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

馃挰LLMsContent type: Academic
arxiv.org

APPO: Agentic Procedural Policy Optimization

馃AI AgentsContent type: Academic
arxiv.org

Transformer-Enhanced Reinforcement Learning: Fundamentals and Applications in Communication Networks

馃攧TransformersContent type: Academic
arxiv.org

Seeing Before Colliding: Anticipatory Safe RL with Frozen Vision-Language Models

馃攳InterpretabilityContent type: Academic
arxiv.org

Reinforcement Learning for Flow-Matching Policies with Density Transport

鈿欙笍Model TrainingContent type: Academic
arxiv.org

Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Grounding

馃攧TransformersContent type: Academic
arxiv.org

Semi-Offline Reinforcement Learning for Optimized Text Generation

馃AI ResearchContent type: Academic
arxiv.org

HARBOR: A Harness Framework for Agentic Robot Reinforcement Learning

馃AI AgentsContent type: Academic
arxiv.org

Critic Architecture Matters: Dual vs. Unified Critics for Humanoid Loco-Manipulation

鈿欙笍Model TrainingContent type: Academic
arxiv.org

GIFT: LLM-Guided State-Reward Interface for Financial Reinforcement Learning

馃挰LLMsContent type: Academic
arxiv.org

Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents

馃AI AgentsContent type: Academic
arxiv.org

Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization

馃搻Scaling LawsContent type: Academic
arxiv.org

Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance

馃挰LLMsContent type: Academic
arxiv.org

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

鈿欙笍Model TrainingContent type: Academic
arxiv.org

Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification

鈿欙笍Model TrainingContent type: Academic
arxiv.org

A Regret Minimization Framework on Preference Learning in Large Language Models

馃挰LLMsContent type: Academic
arxiv.org

Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning

馃AI AgentsContent type: Academic
arxiv.org

IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents

馃AI AgentsContent type: Academic
arxiv.org

Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents

馃AI ResearchContent type: Academic
arxiv.org

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help