Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
馃幃 Reinforcement Learning
RL, RLHF, reward model, policy gradient
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
241
posts in
6.3
ms
Rethinking the Divergence Regularization in LLM
RL
聽
馃挰
LLMs
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Rethinking the Divergence Regularization in LLM RL
The Neutral Mask: How
RLHF
Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language
Model
聽
馃挰
LLMs
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
APPO: Agentic Procedural
Policy
Optimization
聽
馃
AI Agents
聽
Content type:
Academic
arxiv.org
路
7h
7 hours ago
Actions for APPO: Agentic Procedural Policy Optimization
Transformer-Enhanced
Reinforcement
Learning
: Fundamentals and Applications in Communication
Networks
聽
馃攧
Transformers
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Transformer-Enhanced Reinforcement Learning: Fundamentals and Applications in Communication Networks
Seeing Before Colliding: Anticipatory Safe
RL
with Frozen Vision-Language
Models
聽
馃攳
Interpretability
聽
Content type:
Academic
arxiv.org
路
7h
7 hours ago
Actions for Seeing Before Colliding: Anticipatory Safe RL with Frozen Vision-Language Models
Reinforcement
Learning
for Flow-Matching
Policies
with Density Transport
聽
鈿欙笍
Model Training
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Reinforcement Learning for Flow-Matching Policies with Density Transport
Plan-and-Verify Video
Reward
Reasoning with Spatio-Temporal Scene Graph Grounding
聽
馃攧
Transformers
聽
Content type:
Academic
arxiv.org
路
7h
7 hours ago
Actions for Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Grounding
Semi-Offline
Reinforcement
Learning
for
Optimized
Text Generation
聽
馃
AI Research
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Semi-Offline Reinforcement Learning for Optimized Text Generation
HARBOR: A Harness Framework for Agentic Robot
Reinforcement
Learning
聽
馃
AI Agents
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for HARBOR: A Harness Framework for Agentic Robot Reinforcement Learning
Critic
Architecture Matters: Dual vs. Unified Critics for Humanoid Loco-Manipulation
聽
鈿欙笍
Model Training
聽
Content type:
Academic
arxiv.org
路
7h
7 hours ago
Actions for Critic Architecture Matters: Dual vs. Unified Critics for Humanoid Loco-Manipulation
GIFT: LLM-Guided
State-Reward
Interface for Financial
Reinforcement
Learning
聽
馃挰
LLMs
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for GIFT: LLM-Guided State-Reward Interface for Financial Reinforcement Learning
Policy-Conditioned
Counterfactual Credit for Verifiable
Reinforcement
Learning
of Long-Horizon Language Agents
聽
馃
AI Agents
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents
Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization
聽
馃搻
Scaling Laws
聽
Content type:
Academic
arxiv.org
路
7h
7 hours ago
Actions for Verifiable Environments Are LEGO Bricks: Recursive Composition for Reasoning Generalization
Multilingual Sentiment Aware Text Summarization A
Reinforcement
Learning
Approach for Consistency Maintenance
聽
馃挰
LLMs
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
Architecture-Aware
Reinforcement
Learning
Makes Sliding-Window Attention Competitive in Math Reasoning
聽
鈿欙笍
Model Training
聽
Content type:
Academic
arxiv.org
路
7h
7 hours ago
Actions for Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning
Online KL-Regularized
Reinforcement
Learning
with Function Approximation under Misspecification
聽
鈿欙笍
Model Training
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification
A Regret Minimization Framework on Preference
Learning
in Large Language
Models
聽
馃挰
LLMs
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for A Regret Minimization Framework on Preference Learning in Large Language Models
Claw-R1: A Step-Level Data Middleware System for Agentic
Reinforcement
Learning
聽
馃
AI Agents
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning
IAPO: Input Attribution-Aware
Policy
Optimization
for Tool Use in Small Multimodal Agents
聽
馃
AI Agents
聽
Content type:
Academic
arxiv.org
路
7h
7 hours ago
Actions for IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents
Agentic Monte Carlo: Simulating
Reinforcement
Learning
for Black-Box Agents
聽
馃
AI Research
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents
« Page 1
路
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help