Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
馃幃 Reinforcement Learning
RL, AI Agents, Game Playing, Policy Optimization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
41
posts in
7.9
ms
Sequent: scale and automation for higher confidence in alignment
聽
馃
AI Research
lesswrong.com
路
12h
12 hours ago
Actions for Sequent: scale and automation for higher confidence in alignment
NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running
Agents
聽
馃攧
Data Engineering
聽
Content type:
Blog
developer.nvidia.com
路
6d
6 days ago
路
Hacker News
Actions for NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents
AI
model predicts building fire spread, redirecting evacuees to safer exits in real time
聽
馃
AI Research
techxplore.com
路
6d
6 days ago
路
Hacker News
Actions for AI model predicts building fire spread, redirecting evacuees to safer exits in real time
How to Train Your Goblin
聽
鉁嶏笍
Prompt Engineering
goblins.mchen.workers.dev
路
3d
3 days ago
路
Hacker News
,
Hacker News
Actions for How to Train Your Goblin
Mbodi
AI
(YC P25) Is Hiring Founding Machine
Learning
Engineer (Robotics)
聽
馃挕
Entrepreneurship
ycombinator.com
路
4d
4 days ago
路
Hacker News
Actions for Mbodi AI (YC P25) Is Hiring Founding Machine Learning Engineer (Robotics)
KJLdefeated/RL.cu
: RLVR training for LLM in CUDA/C++
聽
鈿欙笍
AI Infrastructure
聽
Content type:
Code
github.com
路
3d
3 days ago
路
Hacker News
Actions for KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training
agents
.
聽
馃
AI
聽
Content type:
Blog
huggingface.co
路
3d
3 days ago
路
Hacker News
,
r/LocalLLaMA
Actions for OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.
Tracing Eval-Awareness Emergence Through Training of OLMo 3
聽
鉁嶏笍
Prompt Engineering
lesswrong.com
路
17h
17 hours ago
Actions for Tracing Eval-Awareness Emergence Through Training of OLMo 3
Best explanations of how LLMs work
聽
鈿欙笍
AI Infrastructure
聽
Content type:
Blog
vorushin.github.io
路
4d
4 days ago
路
Hacker News
Actions for Best explanations of how LLMs work
Show HN: The Deterministic Core Architecture for
AI-Augmented
Applications
聽
鉁嶏笍
Prompt Engineering
brandonbellsystems.com
路
5d
5 days ago
路
Hacker News
Actions for Show HN: The Deterministic Core Architecture for AI-Augmented Applications
The Effective Sample Size
聽
馃
Machine Learning
alex.smola.org
路
6d
6 days ago
路
Hacker News
Actions for The Effective Sample Size
Why Robotics Is a Pre-Paradigm Field
聽
鉁嶏笍
Prompt Engineering
聽
Content type:
News
whattotelltherobot.com
路
4d
4 days ago
路
Hacker News
Actions for Why Robotics Is a Pre-Paradigm Field
Bumblebees can spontaneously solve problems, study finds
聽
馃
AI Research
arstechnica.com
路
6d
6 days ago
Actions for Bumblebees can spontaneously solve problems, study finds
Issue 654
聽
馃攧
Data Engineering
聽
Content type:
Blog
datascienceweekly.substack.com
路
6d
6 days ago
路
Substack
Actions for Issue 654
Introducing the Third Generation of Apple鈥檚 Foundation Models
聽
鈿欙笍
AI Infrastructure
machinelearning.apple.com
路
3d
3 days ago
路
Hacker News
,
r/apple
Actions for Introducing the Third Generation of Apple鈥檚 Foundation Models
Rohin Shah on AGI Safety
聽
鉁嶏笍
Prompt Engineering
lesswrong.com
路
6d
6 days ago
Actions for Rohin Shah on AGI Safety
Optimisation over non-stationary distributions creates weirder minds
聽
鉁嶏笍
Prompt Engineering
lesswrong.com
路
5d
5 days ago
Actions for Optimisation over non-stationary distributions creates weirder minds
Training Deliberative Monitors for Black-Box Scheming Detection
聽
馃
Claude
lesswrong.com
路
6d
6 days ago
Actions for Training Deliberative Monitors for Black-Box Scheming Detection
(Mis)generalization of Helpful-Only Fine-tuning
聽
馃
Claude
lesswrong.com
路
6d
6 days ago
Actions for (Mis)generalization of Helpful-Only Fine-tuning
Do We Want a Superintelligent People-Pleaser?
聽
馃
Claude
lesswrong.com
路
5d
5 days ago
Actions for Do We Want a Superintelligent People-Pleaser?
« Page 1
路
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help