Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
馃幃 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
186661
posts in
18.1
ms
Policy
Improvement
Reinforcement
Learning
聽
鈾燂笍
Game Theory
arxiv.org
路
1d
How does
Reinforcement
Learning
Affect
Models
聽
馃挰
LLMs
lesswrong.com
路
3d
The Data
Layer
Tax for Robot Learning
聽
馃
Machine Learning
rerun.io
路
10h
路
Hacker News
Reinforcement
fine-tuning
with LLM-as-a-judge
聽
馃挰
LLMs
aws.amazon.com
路
3h
How to build custom reasoning agents with a
fraction
of the
compute
聽
馃挰
LLMs
venturebeat.com
路
1d
Adaptive home energy management to
self-motivated
user
preferences
via iterative LLM-augmented reinforcement learning
聽
馃
AI Agents
sciencedirect.com
路
5d
Learning diverse natural behaviors for enhancing the
agility
of
quadrupedal
robots
聽
馃
AI Agents
nature.com
路
1d
There Will Be a
Scientific
Theory of Deep Learning
聽
馃
AI
mail.bycloud.ai
路
1d
A new GitHub
repo
to detect reward hacking in
RL
models
聽
馃
AI Agents
github.com
路
4d
路
Hacker News
Jaxpot
: Train self-play RL agents FAST by
parallelizing
environments on GPU
聽
馃
AI Agents
bardsai.substack.com
路
2d
路
Substack
On-Policy vs Off-Policy RL:
PPO
vs SAC on 5
Gymnasium
Tasks
聽
馃
AI Agents
tildalice.io
路
4d
DEEP
Robotics
聽
馃
Neural Networks
youtube.com
路
3d
路
r/singularity
Accelerate RL
rollouts
by up to 50% with distribution-aware
speculative
decoding
聽
馃挰
LLMs
together.ai
路
6d
Lyapunov-Guided
Self-Alignment: Test-Time Adaptation for
Offline
Safe Reinforcement Learning
聽
馃挰
LLMs
arxiv.org
路
19h
Artificial Intelligence:
Foundations
of
Computational
Agents
聽
馃
AI Agents
artint.info
路
3d
路
Hacker News
Boiler
combustion
optimization via offline reinforcement learning with an ensemble high-dimensional environment
聽
馃
Machine Learning
sciencedirect.com
路
2d
Fixing
What LLMs Get Wrong (22 minute read)
聽
馃挰
LLMs
thebigdataguy.substack.com
路
3d
路
Substack
RL
, in
pictures
and videos
聽
馃
AI Agents
suriya.cc
路
5d
Jamie Simon and Daniel Kunin, UC Berkeley: There Will Be a Scientific Theory of Deep
LearningPodcastApril
24,
2026Read
more
聽
馃
AI
imbue.com
路
6d
Show HN: A live
autonomous
economic network for AI agents
聽
馃
AI Agents
ainetwork-global.github.io
路
3d
路
Hacker News
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help