Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
RLHF, Policy Gradient, Reward Models, Agent Training
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
452
posts in
9.6
ms
Reasoning
RL
in 2026: GRPO,
DPO
, RLVR,
Agentic
PO & Beyond
🤖
LLMs
turingpost.com
·
3d
3 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
Variational
Proximal
Policy
Optimization
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Variational Proximal Policy Optimization
Researchers develop AI-powered railway control system for efficient urban
train
operation
🛡️
AI Safety
techxplore.com
·
12h
12 hours ago
Actions for Researchers develop AI-powered railway control system for efficient urban train operation
Tracing Eval-Awareness Emergence Through
Training
of OLMo 3
🛡️
AI Safety
lesswrong.com
·
14h
14 hours ago
Actions for Tracing Eval-Awareness Emergence Through Training of OLMo 3
Scale Robot
Reinforcement
Learning
with NVIDIA Isaac Lab on Amazon SageMaker AI
🛡️
AI Safety
Content type:
Blog
aws.amazon.com
·
1d
1 day ago
Actions for Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI
SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
🐍
Python
Content type:
Code
github.com
·
3d
3 days ago
·
r/opensource
Actions for SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
local AI
agents
for Cursor with pre-tuned marketplace/commu
🤖
AI
locaible.com
·
11h
11 hours ago
·
Hacker News
Actions for local AI agents for Cursor with pre-tuned marketplace/commu
Q-Learning
(
Reinforcement
learning
): Bellman Equation, Markov Decision Processes, Q-Values, and…
🤖
AI
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…
Reinforcement
Learning
and
Optimal
Control Book (RIP Dimitri Bertsekas)
📱
Android
Content type:
Academic
web.mit.edu
·
5d
5 days ago
·
Hacker News
Actions for Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)
AI
Agent
Mastery & Coaching
🤖
AI
ruv.io
·
2d
2 days ago
Actions for AI Agent Mastery & Coaching
Agents
Need Work Data: A Primer on RLWD, or
Reinforcement
Learning
on Work Data
🛡️
AI Safety
anjalishriva.com
·
1d
1 day ago
·
Hacker News
Actions for Agents Need Work Data: A Primer on RLWD, or Reinforcement Learning on Work Data
Flow-DPPO: Divergence
Proximal
Policy
Optimization
for Flow Matching Models
👁️
Computer Vision
Content type:
Academic
arxiv.org
·
21h
21 hours ago
Actions for Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models
NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running
Agents
🛡️
AI Safety
Content type:
Blog
developer.nvidia.com
·
6d
6 days ago
·
Hacker News
Actions for NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents
See,
Act
, Correct: three levers for working with a code
agent
🤖
LLMs
Content type:
Blog
blog.owulveryck.info
·
6d
6 days ago
·
Hacker News
,
Hacker News
Actions for See, Act, Correct: three levers for working with a code agent
Would a prepaid pass for a coding
agent
solve a real need or is it just my itch?
🐧
Linux
codehamr.com
·
5d
5 days ago
·
r/SideProject
Actions for Would a prepaid pass for a coding agent solve a real need or is it just my itch?
Mult-DPO
: Multinomial Direct Preference
Optimization
for Recommender Systems
🤖
LLMs
Content type:
Academic
arxiv.org
·
21h
21 hours ago
Actions for Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems
École secondaire Notre-Dame-du-Sault to hold graduation on June 24
📈
economics
sootoday.com
·
6d
6 days ago
Actions for École secondaire Notre-Dame-du-Sault to hold graduation on June 24
Memoirs of a
Learning
Machine: Autobiographical
Self-Training
and the
Self-Training
Gap
🛡️
AI Safety
zenodo.org
·
4d
4 days ago
·
Hacker News
Actions for Memoirs of a Learning Machine: Autobiographical Self-Training and the Self-Training Gap
SHAPO: Sharpness-Aware
Policy
Optimization
for Safe Exploration
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
21h
21 hours ago
Actions for SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
Uncertainty-Aware LLM-Guided
Policy
Shaping for
Sparse-Reward
Reinforcement
Learning
🤖
LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help