Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
RLHF, Policy Gradient, Reward Models, Agent Training
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
55
posts in
16.7
ms
Reasoning
RL
in 2026: GRPO,
DPO
, RLVR,
Agentic
PO & Beyond
🎯
RLHF
turingpost.com
·
2d
2 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
Reinforcement
Learning
and
Optimal
Control Book (RIP Dimitri Bertsekas)
🤖
AI
Content type:
Academic
web.mit.edu
·
4d
4 days ago
·
Hacker News
Actions for Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)
Some Interesting Papers on RLVR
📐
Linear Algebra
lesswrong.com
·
14h
14 hours ago
Actions for Some Interesting Papers on RLVR
Q-Learning
(
Reinforcement
learning
): Bellman Equation, Markov Decision Processes, Q-Values, and…
🎯
RLHF
Content type:
Blog
medium.com
·
1d
1 day ago
Actions for Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…
Scale Robot
Reinforcement
Learning
with NVIDIA Isaac Lab on Amazon SageMaker AI
🟩
Nvidia
Content type:
Blog
aws.amazon.com
·
13h
13 hours ago
Actions for Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI
Deep
Learning
Weekly: Issue 458
🤖
Large Language Models
deeplearningweekly.com
·
5d
5 days ago
Actions for Deep Learning Weekly: Issue 458
SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
📊
Data Visualization
Content type:
Code
github.com
·
2d
2 days ago
·
r/opensource
Actions for SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
Prompt Injection Defense Pipeline
🛡️
LLM Security
emergentmind.com
·
6d
6 days ago
Actions for Prompt Injection Defense Pipeline
Direct
Preference
Optimization
Beyond Chatbots
🎯
RLHF
Content type:
Blog
huggingface.co
·
6d
6 days ago
·
Hacker News
Actions for Direct Preference Optimization Beyond Chatbots
Location: Göttingen, Germany Remote: Yes (preferred; hybrid also fine) Willing t...
🤖
Large Language Models
Content type:
Discussion
news.ycombinator.com
·
6d
6 days ago
·
Hacker News
Actions for Location: Göttingen, Germany Remote: Yes (preferred; hybrid also fine) Willing t...
Good teachers don’t cheat
🧮
Complexity Theory
Content type:
Blog
jasonkena.github.io
·
6d
6 days ago
·
Hacker News
Actions for Good teachers don’t cheat
AI Paper Review:
Training
Language
Models
to Follow Instructions with Human Feedback (InstructGPT)
🤖
GenAI
freecodecamp.org
·
6d
6 days ago
Actions for AI Paper Review: Training Language Models to Follow Instructions with Human Feedback (InstructGPT)
SLUUG Talk: Demystifying Large Language
Models
on Linux
🤖
GenAI
Content type:
Code
github.com
·
3d
3 days ago
·
DEV
Actions for SLUUG Talk: Demystifying Large Language Models on Linux
DDPG from Scratch: 400-Line PyTorch Implementation
🤖
AI
tildalice.io
·
6d
6 days ago
Actions for DDPG from Scratch: 400-Line PyTorch Implementation
Nvidia Nemotron 3 Ultra
🤖
AI
research.nvidia.com
·
6d
6 days ago
·
Hacker News
Actions for Nvidia Nemotron 3 Ultra
Towards Shutdownable
Agents
: Generalizing Stochastic Choice in
RL
Agents
and LLMs
🧠
LLM
lesswrong.com
·
6d
6 days ago
Actions for Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs
The Sycophancy Problem: Why AI Can’t Stop Agreeing With You
🤖
AI
moroccoworldnews.com
·
6d
6 days ago
Actions for The Sycophancy Problem: Why AI Can’t Stop Agreeing With You
umair-tareen/philosopher-council: An eleven-philosopher
LLM
council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.
🏛️
Philosophy
Content type:
Code
github.com
·
4d
4 days ago
·
r/SideProject
Actions for umair-tareen/philosopher-council: An eleven-philosopher LLM council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.
My research
agenda
and work
🤖
AI
lesswrong.com
·
4d
4 days ago
Actions for My research agenda and work
Improve your
agent
’s tool-calling accuracy with SFT and
DPO
on Amazon SageMaker AI
🎯
RLHF
Content type:
Blog
aws.amazon.com
·
6d
6 days ago
Actions for Improve your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help