Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 247 posts in 9.9 ms

RUBAS: Rubric-Based Reinforcement Learning for Agent Safety

 🕵️LLM Agents  Content type: Academic
arxiv.org·

Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix

 🕵️LLM Agents  Content type: Academic
arxiv.org·

Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning

 💡AI Reasoning  Content type: Academic
arxiv.org·

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

 🤖AI  Content type: Academic
arxiv.org·

UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

 🔥PyTorch  Content type: Academic
arxiv.org·

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

 🕵️LLM Agents  Content type: Academic
arxiv.org·

Variational Proximal Policy Optimization

 🧠Machine Learning  Content type: Academic
arxiv.org·

On-sky demonstration of reinforcement learning for adaptive optics control

 📐Estimation Theory  Content type: Academic
arxiv.org·

AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning

 🕵️LLM Agents  Content type: Academic
arxiv.org·

GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios

 🤖AI  Content type: Academic
arxiv.org·

Reformulate LLM Reinforcement Learning for Efficient Training under Black-box Discrepancy

 🧠LLM  Content type: Academic
arxiv.org·

COP-Q: Safety-First Reinforcement Learning for Robot Control via Cholesky-Ordered Projection

 🤖Robotics  Content type: Academic
arxiv.org·

Rethinking the Divergence Regularization in LLM RL

 🧠LLM  Content type: Academic
arxiv.org·

EvalStop: Using World Feedback to Detect and Correct Reward Overoptimization in Multi-Tenant RLHF Platforms

 🤖AI  Content type: Academic
arxiv.org·

Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance

 🤖AI  Content type: Academic
arxiv.org·

CATPO: Critique-Augmented Tree Policy Optimization

 💡AI Reasoning  Content type: Academic
arxiv.org·

Policy Gradient for Continuous-Time Robust Markov Decision Processes

 🔢Scientific Computing  Content type: Academic
arxiv.org·

HARBOR: A Harness Framework for Agentic Robot Reinforcement Learning

 🕵️LLM Agents  Content type: Academic
arxiv.org·

Scenario Generation for Risk-Aware Reinforcement Learning with Probably Approximately Safe Guarantees

 🎛️Control Systems  Content type: Academic
arxiv.org·

Co-Evolving Skill Generation and Policy Optimization

 🕵️LLM Agents  Content type: Academic
arxiv.org·

No more posts from yfff's subscribed feeds.

Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help