Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎮 Reinforcement Learning
RL, reward functions, policy gradient, agents, simulation
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
149214
posts in
17.4
ms
Markov
Decision
Processes
: The Language of Reinforcement Learning
🧠
AI Agents
medium.com
·
4d
Hierarchical
Reinforcement Learning with Augmented Step-Level
Transitions
for LLM Agents
🧠
LLMs
arxiv.org
·
2d
Rethinking
Robotics Reinforcement Learning: A Practical
Humanoid
Training Workflow
🤖
Robotics
semiengineering.com
·
1d
Continual
learning for AI agents
🧠
AI Agents
bestblogs.dev
·
4d
Google
DeepMind
's Research Lets an LLM Rewrite Its Own Game Theory Algorithms — And It
Outperformed
the Experts
🤖
AI
marktechpost.com
·
6d
·
r/singularity
Reinforcement
Learning From Human Feedback (
RLHF
) in Large Language Models(LLMs)
🧠
LLMs
pub.towardsai.net
·
6d
Three Ways
Machines
Learn
🤖
AI
medium.com
·
3d
Value-Guidance
MeanFlow
for
Offline
Multi-Agent Reinforcement Learning
🧠
AI Agents
arxiv.org
·
7h
Continual
learning for AI agents
🧠
AI Agents
blog.langchain.com
·
4d
·
Hacker News
Target Policy Optimization
🧠
LLMs
arxiv.org
·
2d
PriPG-RL
: Privileged Planner-Guided Reinforcement Learning for Partially
Observable
Systems with Anytime-Feasible MPC
🕸️
Distributed Systems
arxiv.org
·
7h
Aligning
Agents via Planning: A Benchmark for
Trajectory-Level
Reward Modeling
🧠
AI Agents
arxiv.org
·
7h
Predictive
Representations
for Skill Transfer in Reinforcement Learning
🧠
AI Agents
arxiv.org
·
1d
Multi-agent Reach-avoid
MDP
via Potential Games and
Low-rank
Policy Structure
🕸️
Distributed Systems
arxiv.org
·
7h
SEARL
: Joint Optimization of Policy and Tool Graph Memory for
Self-Evolving
Agents
🧠
AI Agents
arxiv.org
·
7h
MARL-GPT
: Foundation Model for Multi-Agent Reinforcement Learning
🕸️
Distributed Systems
arxiv.org
·
2d
Beyond
Stochastic
Exploration: What Makes Training Data
Valuable
for Agentic Search
🧠
AI Agents
arxiv.org
·
7h
DROP:
Distributional
and Regular Optimism and
Pessimism
for Reinforcement Learning
🕸️
Distributed Systems
arxiv.org
·
1d
Reinforcement Learning with LLM-Guided Action
Spaces
for
Synthesizable
Lead Optimization
🧠
LLMs
arxiv.org
·
7h
Value
Mirror
Descent
for Reinforcement Learning
🕸️
Distributed Systems
arxiv.org
·
2d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help