Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎯 Reinforcement Learning
RL, reward, policy, agent, Q-learning
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
390
posts in
7.1
ms
🥇Top AI Papers of the Week
🧠
LLM Research
Content type:
News
nlp.elvissaravia.com
·
2d
2 days ago
Actions for 🥇Top AI Papers of the Week
See,
Act
, Correct: three levers for working with a code
agent
🧠
LLM Research
Content type:
Blog
blog.owulveryck.info
·
6d
6 days ago
·
Hacker News
,
Hacker News
Actions for See, Act, Correct: three levers for working with a code agent
Import AI 460:
Reward
hacking society, RSI data from Anthropic; and
RL-based
quadcopter racing
🛡️
AI Safety
jack-clark.net
·
2d
2 days ago
Actions for Import AI 460: Reward hacking society, RSI data from Anthropic; and RL-based quadcopter racing
NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running
Agents
🤖
AI Engineering
Content type:
Blog
developer.nvidia.com
·
6d
6 days ago
·
Hacker News
Actions for NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents
Agentic
RL
: Token-In, Token-Out Done Right
🤖
AI Engineering
qgallouedec-tito.hf.space
·
19h
19 hours ago
·
Hacker News
Actions for Agentic RL: Token-In, Token-Out Done Right
Some Interesting Papers on RLVR
🧠
LLM Research
lesswrong.com
·
18h
18 hours ago
Actions for Some Interesting Papers on RLVR
Model predictive task sampling for efficient and robust adaptation
🤖
AI Engineering
Content type:
Academic
nature.com
·
1d
1 day ago
Actions for Model predictive task sampling for efficient and robust adaptation
NVIDIA Enables the Next Era Of Physical AI Research With
Agent
Skills For Autonomous Vehicles, Robotics And Vision AI
🤖
Robotics
Content type:
Blog
blogs.nvidia.com
·
6d
6 days ago
Actions for NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI
Researchers trained an open source AI search
agent
, Harness-1, that outperforms GPT-5.4 on recalling relevant information
🧠
LLM Research
venturebeat.com
·
1d
1 day ago
·
Hacker News
Actions for Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information
Test-Time
Gradient
Guidance of Flow
Policies
in
Reinforcement
Learning
🛡️
AI Safety
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning
Time-slip in AI sepsis models may inflate results, risking under- or overtreatment
🛡️
AI Safety
medicalxpress.com
·
4d
4 days ago
Actions for Time-slip in AI sepsis models may inflate results, risking under- or overtreatment
Experts weigh in on Anthropic’s Fable 5, Mythos 5 releases
🤖
AI Engineering
sdtimes.com
·
12h
12 hours ago
Actions for Experts weigh in on Anthropic’s Fable 5, Mythos 5 releases
A
Functional
Taxonomy of World Models
🤖
AI Engineering
a16z.news
·
6d
6 days ago
Actions for A Functional Taxonomy of World Models
NAVER Expands AI Infrastructure With NVIDIA to Serve Surging Global AI Demand
🧠
LLM Research
nvidianews.nvidia.com
·
2d
2 days ago
Actions for NAVER Expands AI Infrastructure With NVIDIA to Serve Surging Global AI Demand
Test Your Skills Against an AI Air Hockey Robot
🤖
Robotics
Content type:
News
hackster.io
·
5d
5 days ago
Actions for Test Your Skills Against an AI Air Hockey Robot
Geometrically Averaged Hard Target Updates for Linear
Q-Learning
🔩
ML Compilers
Content type:
Academic
arxiv.org
·
9h
9 hours ago
Actions for Geometrically Averaged Hard Target Updates for Linear Q-Learning
Are Classical Machine
Learning
Jobs Dying?
🧠
LLM Research
Content type:
Blog
medium.com
·
1d
1 day ago
Actions for Are Classical Machine Learning Jobs Dying?
Memoirs of a
Learning
Machine: Autobiographical Self-Training and the Self-Training Gap
🧠
LLM Research
zenodo.org
·
3d
3 days ago
·
Hacker News
Actions for Memoirs of a Learning Machine: Autobiographical Self-Training and the Self-Training Gap
Sasha Rush explains targeted
on-policy
self-distillation, a
reinforcement
learning
technique that corrects specific LLM rollout errors
🧠
LLM Research
digg.com
·
6d
6 days ago
Actions for Sasha Rush explains targeted on-policy self-distillation, a reinforcement learning technique that corrects specific LLM rollout errors
Spotlight On: Dreamplug Technologies Private Limited (CRED), a New Principal Participating Organization
🔧
Backend Dev
Content type:
Blog
blog.pcisecuritystandards.org
·
2d
2 days ago
Actions for Spotlight On: Dreamplug Technologies Private Limited (CRED), a New Principal Participating Organization
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help