Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
Q-Learning, Policy Gradient, RL Agents, Game AI
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
348
posts in
35.8
ms
🤖
AI
Microsoft Developer Blogs
·
4d
4 days ago
Outcome-driven
learning
systems: Enterprise
RL
with OpenEnv and Foundry
Covers
3 stories
See all stories this covers
including
SkillOpt: Executive Strategy for Self-Evolving Agent Skills
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Outcome-driven learning systems: Enterprise RL with OpenEnv and Foundry
🤖
AI
medium.com
·
23h
23 hours ago
Reward hacking in
Reinforcement
learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Reward hacking in Reinforcement learning
🤖
AI
The Decoder
·
5d
5 days ago
Nvidia research shows robots that train themselves through
AI
coding
agents
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Nvidia research shows robots that train themselves through AI coding agents
🤖
AI
XYZ Labs
·
17h
17 hours ago
Meet Fugu: The
AI
Model That Doesn't Answer Your Question, It Hires a Team
Discussed on
Substack
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Meet Fugu: The AI Model That Doesn't Answer Your Question, It Hires a Team
🤖
AI
chierhu.medium.com
·
5d
5 days ago
Scaling Self-Play with Self-Guidance: An
AlphaZero-Style
Path for Language Models
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Scaling Self-Play with Self-Guidance: An AlphaZero-Style Path for Language Models
🔬
Science
Nature
·
2d
2 days ago
Attention modulates value normalization in human
reinforcement
learning
by shaping reward encoding
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Attention modulates value normalization in human reinforcement learning by shaping reward encoding
🤖
AI
medium.com
·
5d
5 days ago
Learning
by messing up: A beginner’s tour of
Reinforcement
Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Learning by messing up: A beginner’s tour of Reinforcement Learning
🤖
AI
Digital Trends
·
2d
2 days ago
The Sashimi robot is real and it doesn’t fumble at slicing and dicing
Covered by
kite.kagi.com
,
Teknikveckan
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Sashimi robot is real and it doesn’t fumble at slicing and dicing
🤖
Machine Learning
alignment.openai.com
·
4d
4 days ago
Reinforcement
learning
towards broadly and persistently beneficial models
Covers
Introducing ChatGPT Health
Covered by
6 sources
See all sources covering this story
including
The Decoder
,
tldr.tech
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Reinforcement learning towards broadly and persistently beneficial models
⚙
Mechanical Engneering
NME
·
2d
2 days ago
BOYNEXTDOOR announce first ever world tour, ‘Knock On Vol. 2’
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for BOYNEXTDOOR announce first ever world tour, ‘Knock On Vol. 2’
🤖
Machine Learning
medium.com
·
5d
5 days ago
Continual
Learning
— How to Update a Model Without It Forgetting Everything
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Continual Learning — How to Update a Model Without It Forgetting Everything
🤖
AI
medium.com
·
3d
3 days ago
ICLR 2026 Test of Time: DDPG and the jump to continuous control
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for ICLR 2026 Test of Time: DDPG and the jump to continuous control
🤖
AI
abhishek-shankar.com
·
3d
3 days ago
The Best
Agent
Upgrade of the Year Wasn't a Model
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Best Agent Upgrade of the Year Wasn't a Model
🤖
Machine Learning
ujangriswanto08.medium.com
·
5d
5 days ago
What is SARSA? Understanding the
On-Policy
Learning
Algorithm
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for What is SARSA? Understanding the On-Policy Learning Algorithm
🖨️
3D Printing
semiconinsights.wordpress.com
·
4d
4 days ago
How Does Preference-Based
Reinforcement
Learning
Optimize Robotic Assembly Sequences?
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for How Does Preference-Based Reinforcement Learning Optimize Robotic Assembly Sequences?
🤖
Machine Learning
technotes.substack.com
·
2d
2 days ago
Taste and judgement are lies we tell ourselves
Discussed on
Substack
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Taste and judgement are lies we tell ourselves
🤖
AI
JetBrains
·
6d
6 days ago
Step Rejection Fine-Tuning: Squeezing More Signal from Noisy
Agent
Trajectories
Covers
Group Relative Policy Optimization (GRPO)
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Step Rejection Fine-Tuning: Squeezing More Signal from Noisy Agent Trajectories
🤖
AI
brightray.ai
·
5d
5 days ago
Built Uber aggregator that tracks top
AI
researchers and leaders
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Built Uber aggregator that tracks top AI researchers and leaders
🤖
Machine Learning
sebiwette.de
·
4d
4 days ago
Background
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Background
🤖
AI
LessWrong
·
2d
2 days ago
The Cookie Monster Explains
AI
Safety
Covers
10 stories
See all stories this covers
including
Anthropic confidentially submits draft S-1 to the SEC
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Cookie Monster Explains AI Safety
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report