Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
388
posts in
5.5
ms
Self-evolving LLM
agents
with in-distribution
Optimization
🧠
AI Agents
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Self-evolving LLM agents with in-distribution Optimization
Researchers develop AI-powered railway control system for efficient urban train operation
🧠
AI Agents
techxplore.com
·
1d
1 day ago
Actions for Researchers develop AI-powered railway control system for efficient urban train operation
How to Implement a Model-Free
RL
Algorithm: A Step-by-Step Guide
🤖
Coding Agents
Content type:
Blog
ujangriswanto08.medium.com
·
10h
10 hours ago
Actions for How to Implement a Model-Free RL Algorithm: A Step-by-Step Guide
Q-Learning
(
Reinforcement
learning
): Bellman Equation, Markov Decision Processes, Q-Values, and…
🤖
Machine Learning
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…
Reasoning
RL
in 2026: GRPO, DPO, RLVR,
Agentic
PO
& Beyond
🤖
AI
turingpost.com
·
4d
4 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
Scale Robot
Reinforcement
Learning
with NVIDIA Isaac Lab on Amazon SageMaker AI
🏗️
AI Infrastructure
Content type:
Blog
aws.amazon.com
·
1d
1 day ago
Actions for Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI
SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
🧠
AI Agents
Content type:
Code
github.com
·
3d
3 days ago
·
r/opensource
Actions for SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
Agents
Need Work Data: A Primer on RLWD, or
Reinforcement
Learning
on Work Data
🧠
AI Agents
anjalishriva.com
·
1d
1 day ago
·
Hacker News
Actions for Agents Need Work Data: A Primer on RLWD, or Reinforcement Learning on Work Data
Reinforcement
Learning
and
Optimal
Control Book (RIP Dimitri Bertsekas)
🤖
Machine Learning
Content type:
Academic
web.mit.edu
·
6d
6 days ago
·
Hacker News
Actions for Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)
Are Classical
Machine
Learning
Jobs Dying?
🤖
Machine Learning
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for Are Classical Machine Learning Jobs Dying?
Multi-agent
rendezvous in fluid flows via
reinforcement
learning
🧠
AI Agents
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Multi-agent rendezvous in fluid flows via reinforcement learning
Researchers trained an open source AI search
agent
, Harness-1, that outperforms GPT-5.4 on recalling relevant information
🤖
Machine Learning
venturebeat.com
·
2d
2 days ago
·
Hacker News
Actions for Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information
Time-slip in AI sepsis models may inflate results, risking under- or overtreatment
🧠
AI Agents
medicalxpress.com
·
5d
5 days ago
Actions for Time-slip in AI sepsis models may inflate results, risking under- or overtreatment
Model predictive task sampling for efficient and robust adaptation
💬
LLMs
Content type:
Academic
nature.com
·
2d
2 days ago
Actions for Model predictive task sampling for efficient and robust adaptation
AI
Agent
Mastery & Coaching
🧠
AI Agents
ruv.io
·
3d
3 days ago
Actions for AI Agent Mastery & Coaching
Introducing North Mini Code: Cohere’s First Model For Developers
🤖
Machine Learning
Content type:
Blog
huggingface.co
·
2d
2 days ago
·
Hacker News
Actions for Introducing North Mini Code: Cohere’s First Model For Developers
Reinforcement
Learning
Disrupts
Gradient-Based
Adversarial Optimization
🤖
Machine Learning
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization
Import AI 460:
Reward
hacking society, RSI data from Anthropic; and
RL-based
quadcopter racing
🧠
AI Agents
jack-clark.net
·
3d
3 days ago
Actions for Import AI 460: Reward hacking society, RSI data from Anthropic; and RL-based quadcopter racing
Some Interesting Papers on RLVR
💬
LLMs
lesswrong.com
·
1d
1 day ago
Actions for Some Interesting Papers on RLVR
Memoirs of a
Learning
Machine
: Autobiographical Self-Training and the Self-Training Gap
🧠
AI Agents
zenodo.org
·
4d
4 days ago
·
Hacker News
Actions for Memoirs of a Learning Machine: Autobiographical Self-Training and the Self-Training Gap
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help