Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
RLHF, Policy Gradient, Reward Models, Agent Training
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
460
posts in
14.4
ms
Posting for authoring
🧠
Philosophy
turingpost.com
·
3d
3 days ago
Actions for Posting for authoring
SHAPO: Sharpness-Aware
Policy
Optimization
for Safe Exploration
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
Beyond Uniform Token-Level Trust Region in LLM
Reinforcement
Learning
🤖
LLMs
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning
The Neutral Mask: How
RLHF
Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language
Model
🤖
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
Discovering Interpretable Multi-Parameter Control
Policies
for Evolutionary Algorithms Using Deep
Reinforcement
Learning
🤖
AI
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Discovering Interpretable Multi-Parameter Control Policies for Evolutionary Algorithms Using Deep Reinforcement Learning
A Unifying Lens on
Reward
Uncertainty in
RLHF
🤖
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for A Unifying Lens on Reward Uncertainty in RLHF
SocraticPO:
Policy
Optimization
via Interactive Guidance
🤖
AI
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for SocraticPO: Policy Optimization via Interactive Guidance
Self-evolving LLM
agents
with in-distribution
Optimization
🤖
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Self-evolving LLM agents with in-distribution Optimization
Test-Time
Gradient
Guidance of Flow
Policies
in
Reinforcement
Learning
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning
3SPO: State-Score-Supervised
Policy
Optimization
for LLM
Agents
🤖
LLMs
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for 3SPO: State-Score-Supervised Policy Optimization for LLM Agents
TT-DAC-PS: Twin-Target Deterministic
Actor-Critic
with
Policy
Smoothing for Optimal Trade Execution
🤖
AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution
Fast and Highly Expressive
Policy
Learning
for Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
Multilingual Sentiment Aware Text Summarization A
Reinforcement
Learning
Approach for Consistency Maintenance
🤖
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
Mechanistic Analysis of Alignment Algorithms in Language
Models
🤖
LLMs
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Mechanistic Analysis of Alignment Algorithms in Language Models
Performance Variation in Deep
Reinforcement
Learning
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Performance Variation in Deep Reinforcement Learning
Rethinking the Divergence Regularization in LLM
RL
🤖
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Rethinking the Divergence Regularization in LLM RL
Development of COVID-19 Booster Vaccine
Policy
by Microsimulation and
Q-learning
🩺
Health
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Development of COVID-19 Booster Vaccine Policy by Microsimulation and Q-learning
BiasGRPO: Stabilizing Bias Mitigation in High-Variance
Reward
Landscapes via Group-Relative
Policy
Optimization
🤖
LLMs
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization
Embodiment-conditioned Generalist Control for Multirotor Aerial Robots
🔄
Transformers
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Embodiment-conditioned Generalist Control for Multirotor Aerial Robots
Belief-Space Quantum-Inspired
Reinforcement
Learning
for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Belief-Space Quantum-Inspired Reinforcement Learning for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help