Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🤖 Reinforcement Learning
Agents
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
407
posts in
7.5
ms
Reinforcement
Learning
for Flow-Matching
Policies
with Density Transport
🤖
Machine Learning
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Reinforcement Learning for Flow-Matching Policies with Density Transport
Uncertainty-Aware LLM-Guided
Policy
Shaping for Sparse-Reward
Reinforcement
Learning
🤖
Transformers
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning
Event-Driven
Reinforcement
Learning
Enables Long-Horizon Control in Semiconductor Fabrication
👁️
Attention Mechanisms
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication
Cooperative Long Rope Skipping via
Multi-Agent
Reinforcement
Learning
🤖
AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning
MDP-GRPO
: Stabilized Group Relative
Policy
Optimization for Multi-Constraint Instruction Following
⚙️
Systems Programming
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for MDP-GRPO: Stabilized Group Relative Policy Optimization for Multi-Constraint Instruction Following
Self-Evolving Scientific
Agent
Discovers Generalizable Physically-Reasoned Fluid Control
📚
Compilers
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Self-Evolving Scientific Agent Discovers Generalizable Physically-Reasoned Fluid Control
Policy-Conditioned
Counterfactual Credit for Verifiable
Reinforcement
Learning
of Long-Horizon Language Agents
🤖
AI
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents
3SPO: State-Score-Supervised
Policy
Optimization for LLM
Agents
🤖
AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for 3SPO: State-Score-Supervised Policy Optimization for LLM Agents
An
Agency-Transferring
Model-Free
Policy
Enhancement Technique
🤖
Machine Learning
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for An Agency-Transferring Model-Free Policy Enhancement Technique
Learning
to replenish: A hybrid
deep
reinforcement
learning
for dynamic inventory management in the pharmaceutical supply chains
🤖
Machine Learning
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Learning to replenish: A hybrid deep reinforcement learning for dynamic inventory management in the pharmaceutical supply chains
Geometry-Aware
Reinforcement
Learning
for 2D Irregular Nesting
⚡
SIMD Optimization
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Geometry-Aware Reinforcement Learning for 2D Irregular Nesting
Distilling LLM Reasoning into an Interpretable
Policy
Tree for Human-AI Collaboration
🤖
Transformers
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Distilling LLM Reasoning into an Interpretable Policy Tree for Human-AI Collaboration
Alpha-RTL: Test-Time Training for RTL Hardware Optimization
⚙️
JIT Compilation
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Alpha-RTL: Test-Time Training for RTL Hardware Optimization
GIFT: LLM-Guided State-Reward Interface for Financial
Reinforcement
Learning
🤖
AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for GIFT: LLM-Guided State-Reward Interface for Financial Reinforcement Learning
GenPO++: Generative
Policy
Optimization with Jacobian-free Likelihood Ratios
🤖
AI
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios
Failure Modes of
Deep
Multi-Agent
RL
in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix
🤖
AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix
Beyond Uniform Token-Level Trust Region in LLM
Reinforcement
Learning
🤖
Transformers
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning
Learning
Predictive Control with
Deep
Koopman Operators for Autonomous Vehicle Motion Planning
🤖
Robotics
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Learning Predictive Control with Deep Koopman Operators for Autonomous Vehicle Motion Planning
Learning
Multi-Agent
Communication Protocol: Study on Information Entropy Efficiency in MARL
🤖
AI
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Learning Multi-Agent Communication Protocol: Study on Information Entropy Efficiency in MARL
Claw-R1: A Step-Level Data Middleware System for
Agentic
Reinforcement
Learning
🤖
AI
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Claw-R1: A Step-Level Data Middleware System for Agentic Reinforcement Learning
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help