Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
馃幃 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
187334
posts in
13.4
ms
Policy
Improvement
Reinforcement
Learning
聽
鈾燂笍
Game Theory
arxiv.org
路
1d
How does
Reinforcement
Learning
Affect
Models
聽
馃挰
LLMs
lesswrong.com
路
3d
The Data
Layer
Tax for Robot Learning
聽
馃
Machine Learning
rerun.io
路
14h
路
Hacker News
Every Model Learned by Gradient
Descent
Is
Approximately
a Kernel Machine
聽
馃
Machine Learning
news.ycombinator.com
路
2h
路
Hacker News
How to build custom reasoning agents with a
fraction
of the
compute
聽
馃挰
LLMs
venturebeat.com
路
2d
Reinforcement
fine-tuning
with LLM-as-a-judge
聽
馃挰
LLMs
aws.amazon.com
路
7h
Adaptive home energy management to
self-motivated
user
preferences
via iterative LLM-augmented reinforcement learning
聽
馃
AI Agents
sciencedirect.com
路
5d
Learning diverse natural behaviors for enhancing the
agility
of
quadrupedal
robots
聽
馃
AI Agents
nature.com
路
1d
There Will Be a
Scientific
Theory of Deep Learning
聽
馃
AI
mail.bycloud.ai
路
1d
A new GitHub
repo
to detect reward hacking in
RL
models
聽
馃
AI Agents
github.com
路
4d
路
Hacker News
Jaxpot
: Train self-play RL agents FAST by
parallelizing
environments on GPU
聽
馃
AI Agents
bardsai.substack.com
路
2d
路
Substack
On-Policy vs Off-Policy RL:
PPO
vs SAC on 5
Gymnasium
Tasks
聽
馃
AI Agents
tildalice.io
路
4d
The Policy Picks the Policy
聽
馃
AI Agents
noise2signal.bearblog.dev
路
2d
DEEP
Robotics
聽
馃
Neural Networks
youtube.com
路
3d
路
r/singularity
Artificial Intelligence:
Foundations
of
Computational
Agents
聽
馃
AI Agents
artint.info
路
3d
路
Hacker News
Lyapunov-Guided
Self-Alignment: Test-Time Adaptation for
Offline
Safe Reinforcement Learning
聽
馃挰
LLMs
arxiv.org
路
23h
Fixing
What LLMs Get Wrong (22 minute read)
聽
馃挰
LLMs
thebigdataguy.substack.com
路
4d
路
Substack
Boiler
combustion
optimization via offline reinforcement learning with an ensemble high-dimensional environment
聽
馃
Machine Learning
sciencedirect.com
路
2d
RL
, in
pictures
and videos
聽
馃
AI Agents
suriya.cc
路
6d
Show HN: A live
autonomous
economic network for AI agents
聽
馃
AI Agents
ainetwork-global.github.io
路
3d
路
Hacker News
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help