Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
386
posts in
8.3
ms
Fog of Love: Engineering Virtuous
Agent
Behavior with Affinity-based
Reinforcement
Learning
in a Game Environment
🤖
AI Agents
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment
Researchers develop AI-powered railway control system for efficient urban train operation
🧮
Algorithms
techxplore.com
·
3h
3 hours ago
Actions for Researchers develop AI-powered railway control system for efficient urban train operation
Q-Learning
(
Reinforcement
learning
): Bellman Equation,
Markov
Decision Processes, Q-Values, and…
♟️
Game Theory
Content type:
Blog
medium.com
·
1d
1 day ago
Actions for Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…
Agents
Need Work Data: A Primer on RLWD, or
Reinforcement
Learning
on Work Data
🤖
AI Agents
anjalishriva.com
·
23h
23 hours ago
·
Hacker News
Actions for Agents Need Work Data: A Primer on RLWD, or Reinforcement Learning on Work Data
Scale Robot
Reinforcement
Learning
with NVIDIA Isaac Lab on Amazon SageMaker AI
🤖
AI
Content type:
Blog
aws.amazon.com
·
20h
20 hours ago
Actions for Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI
Reasoning
RL
in 2026: GRPO, DPO, RLVR,
Agentic
PO
& Beyond
🤖
Machine Learning
turingpost.com
·
3d
3 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
AI
Agent
Mastery & Coaching
🔍
RAG
ruv.io
·
2d
2 days ago
Actions for AI Agent Mastery & Coaching
Researchers trained an open source AI search
agent
, Harness-1, that outperforms GPT-5.4 on recalling relevant information
🤖
AI
venturebeat.com
·
1d
1 day ago
·
Hacker News
Actions for Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information
Some Interesting Papers on RLVR
∑
Math
lesswrong.com
·
21h
21 hours ago
Actions for Some Interesting Papers on RLVR
See, Act, Correct: three levers for working with a code
agent
🤖
AI
Content type:
Blog
blog.owulveryck.info
·
6d
6 days ago
·
Hacker News
,
Hacker News
Actions for See, Act, Correct: three levers for working with a code agent
Social intelligence Arises Between Minds
🤖
AI Agents
psychologytoday.com
·
2d
2 days ago
Actions for Social intelligence Arises Between Minds
Reinforcement
Learning
and
Optimal
Control Book (RIP Dimitri Bertsekas)
🧮
Algorithms
Content type:
Academic
web.mit.edu
·
5d
5 days ago
·
Hacker News
Actions for Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)
Reinforcement
learning
in linear embedding space unlocks generalizable control across soft robot configurations
🤖
AI Agents
Content type:
Academic
nature.com
·
2d
2 days ago
Actions for Reinforcement learning in linear embedding space unlocks generalizable control across soft robot configurations
NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running
Agents
🤖
AI Agents
Content type:
Blog
developer.nvidia.com
·
6d
6 days ago
·
Hacker News
Actions for NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents
DDPG from Scratch: 400-Line PyTorch Implementation
🤖
Machine Learning
tildalice.io
·
6d
6 days ago
Actions for DDPG from Scratch: 400-Line PyTorch Implementation
Less-relevant results
Why LLMs (still) lack taste
🤖
Machine Learning
beyondtheprior.com
·
1d
1 day ago
·
Hacker News
Actions for Why LLMs (still) lack taste
Time-slip in AI sepsis models may inflate results, risking under- or overtreatment
🤖
AI
medicalxpress.com
·
4d
4 days ago
Actions for Time-slip in AI sepsis models may inflate results, risking under- or overtreatment
Fast and Highly Expressive
Policy
Learning
for Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
📐
ML Theory
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
Import AI 460:
Reward
hacking society, RSI data from Anthropic; and
RL-based
quadcopter racing
🤖
AI Agents
jack-clark.net
·
2d
2 days ago
Actions for Import AI 460: Reward hacking society, RSI data from Anthropic; and RL-based quadcopter racing
Cohere open-sources a coding
agent
that runs on a single H100
💬
LLMs
venturebeat.com
·
18h
18 hours ago
Actions for Cohere open-sources a coding agent that runs on a single H100
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help