Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 459 posts in 10.5 ms

Enhancing the MADDPG Algorithm for Multi-Agent Learning via Action Inference and Importance Sampling

 🤖LLMs  Content type: Academic
arxiv.org·

A Regret Minimization Framework on Preference Learning in Large Language Models

 🤖LLMs  Content type: Academic
arxiv.org·

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

 🤖LLMs  Content type: Academic
arxiv.org·

3SPO: State-Score-Supervised Policy Optimization for LLM Agents

 🤖LLMs  Content type: Academic
arxiv.org·

Reinforcement Learning for Flow-Matching Policies with Density Transport

 👁️Computer Vision  Content type: Academic
arxiv.org·

Sequential Data Poisoning in LLM Post-Training

 🤖LLMs  Content type: Academic
arxiv.org·

Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning

 🤖AI  Content type: Academic
arxiv.org·

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

 🛡️AI Safety  Content type: Academic
arxiv.org·

Retry Policy Gradients in Continuous Action Spaces

 🤖AI  Content type: Academic
arxiv.org·

Mechanistic Analysis of Alignment Algorithms in Language Models

 🤖LLMs  Content type: Academic
arxiv.org·

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

 🤖AI  Content type: Academic
arxiv.org·

Distilling LLM Reasoning into an Interpretable Policy Tree for Human-AI Collaboration

 🛡️AI Safety  Content type: Academic
arxiv.org·

Dynamic Multi-Pair Trading Strategy in Cryptocurrency Markets with Deep Reinforcement Learning

 🤖AI  Content type: Academic
arxiv.org·

Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

 🛡️AI Safety  Content type: Academic
arxiv.org·

DOG-DPO:Dynamic Optimization in Geometry for Safety Alignment

 🛡️AI Safety  Content type: Academic
arxiv.org·

Self-evolving LLM agents with in-distribution Optimization

 🤖LLMs  Content type: Academic
arxiv.org·

Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning

 🤖AI  Content type: Academic
arxiv.org·

Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment

 🛡️AI Safety  Content type: Academic
arxiv.org·

Principled Agent Debate: Adversarial Arbitration for Sycophancy Reduction in Large Language Models

 🤖LLMs  Content type: Academic
arxiv.org·

Development of COVID-19 Booster Vaccine Policy by Microsimulation and Q-learning

 🩺Health  Content type: Academic
arxiv.org·
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help