Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎯 Reinforcement Learning
RL, RLHF, reward models, policy optimization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
405
posts in
6.9
ms
Representation-Aware Advantage Estimation: Your
Reward
Model
Provides More Than A Scalar Output
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output
Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient
Reinforcement
Learning
of Language
Models
🎛️
Fine-tuning
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models
SocraticPO:
Policy
Optimization
via Interactive Guidance
💾
Agent Memory
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for SocraticPO: Policy Optimization via Interactive Guidance
SHAPO: Sharpness-Aware
Policy
Optimization
for Safe
Exploration
💻
AI Coding
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
Agentic
Monte Carlo: Simulating
Reinforcement
Learning
for Black-Box Agents
🧠
Reasoning Models
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents
Event-Driven
Reinforcement
Learning
Enables Long-Horizon Control in Semiconductor Fabrication
💾
Agent Memory
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication
SALT: When More Rollouts Don't Help in Group-Based
Policy
Optimization
and How to Make Them Matter
💾
Agent Memory
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for SALT: When More Rollouts Don't Help in Group-Based Policy Optimization and How to Make Them Matter
Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using
Reinforcement
Learning
🌐
World Models
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning
Beyond Uniform Token-Level Trust Region in LLM
Reinforcement
Learning
🧠
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning
Progress-SQL: Improving
Reinforcement
Learning
for Text-to-SQL via Progressive
Rewards
💾
Agent Memory
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Progress-SQL: Improving Reinforcement Learning for Text-to-SQL via Progressive Rewards
GIFT: LLM-Guided
State-Reward
Interface for Financial
Reinforcement
Learning
🌐
World Models
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for GIFT: LLM-Guided State-Reward Interface for Financial Reinforcement Learning
MDP-GRPO: Stabilized Group Relative
Policy
Optimization
for
Multi-Constraint
Instruction Following
📊
Model Evaluation
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following
Deep
reinforcement
learning
for process design: Review and perspective
🧠
Reasoning Models
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Deep reinforcement learning for process design: Review and perspective
Geometry-Aware
Reinforcement
Learning
for 2D Irregular Nesting
🌐
World Models
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Geometry-Aware Reinforcement Learning for 2D Irregular Nesting
Learning
to replenish: A hybrid deep
reinforcement
learning
for dynamic inventory management in the pharmaceutical supply chains
💾
Agent Memory
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Learning to replenish: A hybrid deep reinforcement learning for dynamic inventory management in the pharmaceutical supply chains
Momentum for Reasoning: Dense Intrinsic Signals in
Policy
Optimization
🧠
Reasoning Models
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Momentum for Reasoning: Dense Intrinsic Signals in Policy Optimization
Multilingual
Sentiment Aware Text Summarization A
Reinforcement
Learning
Approach for Consistency Maintenance
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
Merging
model-based
control with
multi-agent
reinforcement learning for
multi-agent
cooperative teaming strategies
🤖
AI Agents
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies
A Regret Minimization Framework on Preference
Learning
in Large Language
Models
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for A Regret Minimization Framework on Preference Learning in Large Language Models
PRPO:
Perception-Reinforced
Policy
Optimization
via Token-Level Dynamic Advantage Reshaping
👁️
Multimodal AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for PRPO: Perception-Reinforced Policy Optimization via Token-Level Dynamic Advantage Reshaping
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help