Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
RLHF, Policy Gradient, Reward Models, Agent Training
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
459
posts in
10.5
ms
Enhancing the MADDPG Algorithm for
Multi-Agent
Learning
via Action Inference and Importance Sampling
🤖
LLMs
Content type:
Academic
arxiv.org
·
1w
1 week ago
Actions for Enhancing the MADDPG Algorithm for Multi-Agent Learning via Action Inference and Importance Sampling
A Regret Minimization Framework on Preference
Learning
in Large Language
Models
🤖
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for A Regret Minimization Framework on Preference Learning in Large Language Models
Uncertainty-Aware LLM-Guided
Policy
Shaping for
Sparse-Reward
Reinforcement
Learning
🤖
LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning
3SPO: State-Score-Supervised
Policy
Optimization
for LLM
Agents
🤖
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for 3SPO: State-Score-Supervised Policy Optimization for LLM Agents
Reinforcement
Learning
for Flow-Matching
Policies
with Density Transport
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Reinforcement Learning for Flow-Matching Policies with Density Transport
Sequential Data Poisoning in LLM
Post-Training
🤖
LLMs
Content type:
Academic
arxiv.org
·
1w
1 week ago
Actions for Sequential Data Poisoning in LLM Post-Training
Structure-Conditioned
Actor-Critic
Branches for Quality-Diversity
Reinforcement
Learning
🤖
AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning
Fast and Highly Expressive
Policy
Learning
for Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
Retry
Policy
Gradients
in Continuous Action Spaces
🤖
AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Retry Policy Gradients in Continuous Action Spaces
Mechanistic Analysis of Alignment Algorithms in Language
Models
🤖
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Mechanistic Analysis of Alignment Algorithms in Language Models
OrderGrad:
Optimizing
Beyond the Mean with Order-Statistic
Policy
Gradient
Estimation
🤖
AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation
Distilling LLM Reasoning into an Interpretable
Policy
Tree for Human-AI Collaboration
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Distilling LLM Reasoning into an Interpretable Policy Tree for Human-AI Collaboration
Dynamic Multi-Pair Trading Strategy in Cryptocurrency Markets with Deep
Reinforcement
Learning
🤖
AI
Content type:
Academic
arxiv.org
·
1w
1 week ago
Actions for Dynamic Multi-Pair Trading Strategy in Cryptocurrency Markets with Deep Reinforcement Learning
Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using
Reinforcement
Learning
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning
DOG-DPO
:Dynamic
Optimization
in Geometry for Safety Alignment
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment
Self-evolving LLM
agents
with in-distribution
Optimization
🤖
LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Self-evolving LLM agents with in-distribution Optimization
Cooperative Long Rope Skipping via
Multi-Agent
Reinforcement
Learning
🤖
AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning
Fog of Love: Engineering Virtuous
Agent
Behavior with
Affinity-based
Reinforcement
Learning in a Game Environment
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
1w
1 week ago
Actions for Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment
Principled
Agent
Debate: Adversarial Arbitration for Sycophancy Reduction in Large Language
Models
🤖
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Principled Agent Debate: Adversarial Arbitration for Sycophancy Reduction in Large Language Models
Development of COVID-19 Booster Vaccine
Policy
by Microsimulation and
Q-learning
🩺
Health
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Development of COVID-19 Booster Vaccine Policy by Microsimulation and Q-learning
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help