Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
Q-Learning, Policy Gradients, Environments, Rewards
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
128
posts in
6.0
ms
Geometrically Averaged Hard Target Updates for Linear
Q-Learning
⚡
Query Optimization
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Geometrically Averaged Hard Target Updates for Linear Q-Learning
Reasoning
RL
in 2026: GRPO, DPO, RLVR, Agentic
PO
& Beyond
🤖
AI
turingpost.com
·
3d
3 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
Q-Learning
(
Reinforcement
learning
): Bellman Equation,
Markov
Decision Processes, Q-Values, and…
🤖
AI
Content type:
Blog
medium.com
·
1d
1 day ago
Actions for Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…
Researchers develop AI-powered railway control system for efficient urban train operation
🤖
ML
techxplore.com
·
9h
9 hours ago
Actions for Researchers develop AI-powered railway control system for efficient urban train operation
Scale Robot
Reinforcement
Learning
with NVIDIA Isaac Lab on Amazon SageMaker AI
🤖
AI
Content type:
Blog
aws.amazon.com
·
1d
1 day ago
Actions for Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI
Reinforcement
Learning
and Optimal Control Book (RIP Dimitri Bertsekas)
🤖
ML
Content type:
Academic
web.mit.edu
·
5d
5 days ago
·
Hacker News
Actions for Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)
Some Interesting Papers on RLVR
🤖
AI
lesswrong.com
·
1d
1 day ago
Actions for Some Interesting Papers on RLVR
Good teachers don’t cheat
🤖
AI
Content type:
Blog
jasonkena.github.io
·
6d
6 days ago
·
Hacker News
Actions for Good teachers don’t cheat
DQN
Tutorial -
RL
Summer School 2026
🤖
AI
araffin.github.io
·
1d
1 day ago
Actions for DQN Tutorial - RL Summer School 2026
AI-powered living business intelligence network
⚡
Query Optimization
atlasforgex.com
·
8h
8 hours ago
·
Hacker News
Actions for AI-powered living business intelligence network
SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
🤖
AI
Content type:
Code
github.com
·
3d
3 days ago
·
r/opensource
Actions for SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
Why LLMs (still) lack taste
🤖
AI
beyondtheprior.com
·
1d
1 day ago
·
Hacker News
Actions for Why LLMs (still) lack taste
Hrithik Roshan Signs With Anonymous Content
💾
Database
Content type:
News
deadline.com
·
6h
6 hours ago
Actions for Hrithik Roshan Signs With Anonymous Content
Spotlight On: Dreamplug Technologies Private Limited (CRED), a New Principal Participating Organization
🏗️
Data Engineering
Content type:
Blog
blog.pcisecuritystandards.org
·
2d
2 days ago
Actions for Spotlight On: Dreamplug Technologies Private Limited (CRED), a New Principal Participating Organization
Fast and Highly Expressive
Policy
Learning
for Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
🤖
AI
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
Nvidia Nemotron 3 Ultra
🤖
AI
research.nvidia.com
·
6d
6 days ago
·
Hacker News
Actions for Nvidia Nemotron 3 Ultra
You'
re
doing it wrong
🔀
Transformers
Content type:
News
understandably.com
·
1d
1 day ago
Actions for You're doing it wrong
Stack Overflow didn't just help AI
learn
to code
🤖
AI
zozo123.github.io
·
3d
3 days ago
·
Hacker News
Actions for Stack Overflow didn't just help AI learn to code
Import AI 460:
Reward
hacking society, RSI data from Anthropic; and
RL-based
quadcopter racing
🤖
AI
Content type:
News
Content type:
Blog
importai.substack.com
·
2d
2 days ago
·
Substack
Actions for Import AI 460: Reward hacking society, RSI data from Anthropic; and RL-based quadcopter racing
Why Claude Produces High-Quality Output: A Developer’s Guide to Token Efficiency and Hallucination…
🤖
AI
Content type:
Blog
medium.com
·
5d
5 days ago
Actions for Why Claude Produces High-Quality Output: A Developer’s Guide to Token Efficiency and Hallucination…
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help