Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎯 Reinforcement Learning
RL, RLHF, reward models, policy optimization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
449
posts in
6.4
ms
Learning
to replenish: A hybrid deep
reinforcement
learning
for dynamic inventory management in the pharmaceutical supply chains
💾
Agent Memory
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Learning to replenish: A hybrid deep reinforcement learning for dynamic inventory management in the pharmaceutical supply chains
Multilingual
Sentiment Aware Text Summarization A
Reinforcement
Learning
Approach for Consistency Maintenance
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
Momentum for Reasoning: Dense Intrinsic Signals in
Policy
Optimization
🧠
Reasoning Models
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Momentum for Reasoning: Dense Intrinsic Signals in Policy Optimization
Merging
model-based
control with
multi-agent
reinforcement learning for
multi-agent
cooperative teaming strategies
🤖
AI Agents
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies
A Regret Minimization Framework on Preference
Learning
in Large Language
Models
🧠
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for A Regret Minimization Framework on Preference Learning in Large Language Models
PRPO:
Perception-Reinforced
Policy
Optimization
via Token-Level Dynamic Advantage Reshaping
👁️
Multimodal AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for PRPO: Perception-Reinforced Policy Optimization via Token-Level Dynamic Advantage Reshaping
Drag reduction or
reward
hacking? Recurrent
multi-agent
reinforcement learning that earns its
reward
🤖
AI Agents
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward
CATPO: Critique-Augmented Tree
Policy
Optimization
🧠
Reasoning Models
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for CATPO: Critique-Augmented Tree Policy Optimization
DynaCF: Mitigating Shortcut
Learning
in
Reward
Models
via Dynamic Counterfactual Sensitivity
💾
Agent Memory
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for DynaCF: Mitigating Shortcut Learning in Reward Models via Dynamic Counterfactual Sensitivity
TARPO: Token-Wise Latent-Explicit Reasoning via Action-Routing
Policy
Optimization
✍️
Prompt Engineering
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for TARPO: Token-Wise Latent-Explicit Reasoning via Action-Routing Policy Optimization
Test-Time
Gradient
Guidance of Flow
Policies
in
Reinforcement
Learning
💻
AI Coding
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning
Reasoning or Memorization? Direction-Aware Diversity
Exploration
in LLM
Reinforcement
Learning
🌐
World Models
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Reasoning or Memorization? Direction-Aware Diversity Exploration in LLM Reinforcement Learning
Online KL-Regularized
Reinforcement
Learning
with
Function
Approximation under Misspecification
🌐
World Models
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification
Belief-Space Quantum-Inspired
Reinforcement
Learning
for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles
💾
Agent Memory
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Belief-Space Quantum-Inspired Reinforcement Learning for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles
SARM2:
Multi-Task
Stage Aware
Reward
Modeling
for Self Improving Robotic Manipulation
👁️
Multimodal AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for SARM2: Multi-Task Stage Aware Reward Modeling for Self Improving Robotic Manipulation
GenPO++: Generative
Policy
Optimization
with Jacobian-free Likelihood Ratios
🌐
World Models
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios
Development of COVID-19 Booster Vaccine
Policy
by Microsimulation and
Q-learning
💎
Token Economics
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Development of COVID-19 Booster Vaccine Policy by Microsimulation and Q-learning
Teaching the Way, Not the Answer: Privileged Tutoring Distillation for
Multimodal
Policy
Optimization
👁️
Multimodal AI
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Teaching the Way, Not the Answer: Privileged Tutoring Distillation for Multimodal Policy Optimization
On-sky demonstration of
reinforcement
learning
for adaptive optics control
🎛️
Fine-tuning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for On-sky demonstration of reinforcement learning for adaptive optics control
The Hidden Bias of Process
Reward
Models
:PRISM for Rewarding the Right Reasoning
🧠
Reasoning Models
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for The Hidden Bias of Process Reward Models:PRISM for Rewarding the Right Reasoning
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help