Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎮 Reinforcement Learning
Specific
RL, reward function, policy gradient, Q-learning, OpenAI Gym
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
187364
posts in
11.3
ms
Dynamical
Priors
as a Training Objective in Reinforcement Learning
🤖
AI
arxiv.org
·
6d
How to build custom reasoning agents with a
fraction
of the
compute
🧠
LLMs
venturebeat.com
·
2d
The Data
Layer
Tax for Robot Learning
🧠
LLMs
rerun.io
·
14h
·
Hacker News
Every Model Learned by Gradient
Descent
Is
Approximately
a Kernel Machine
🧠
LLMs
news.ycombinator.com
·
2h
·
Hacker News
Boiler
combustion
optimization via offline reinforcement learning with an ensemble high-dimensional environment
🤖
AI
sciencedirect.com
·
2d
Reinforcement
fine-tuning
with LLM-as-a-judge
🧠
LLMs
aws.amazon.com
·
7h
How does
Reinforcement
Learning
Affect
Models
🧠
LLMs
lesswrong.com
·
3d
Is your AI strategy missing a "Safety Net"?🛡️
🤖
AI
turingpost.com
·
6h
Learning diverse natural behaviors for enhancing the
agility
of
quadrupedal
robots
🧠
LLMs
nature.com
·
1d
Red-teaming
a network of agents: Understanding what breaks when AI agents
interact
at scale
🤖
AI
microsoft.com
·
5h
Deep Learning Weekly: Issue 453
🧠
LLMs
deeplearningweekly.com
·
12h
Virtual
Cards
for AI Agents
🤖
AI
agentcard.ai
·
1h
Jaxpot
: Train self-play RL agents FAST by
parallelizing
environments on GPU
🧠
LLMs
bardsai.substack.com
·
2d
·
Substack
A new GitHub
repo
to detect reward hacking in
RL
models
🧠
LLMs
github.com
·
4d
·
Hacker News
Constraints
That Compute: A Unified Framework for Efficient Intelligence from Prime
Harmonics
to Latent Reasoning
🤖
AI
zenodo.org
·
10h
·
Hacker News
Building an AI-Powered Prediction Engine for
Racing
Data: A Developer's
Journey
🤖
AI
altilineverir.com.tr
·
4h
·
DEV
There Will Be a
Scientific
Theory of Deep Learning
🤖
AI
mail.bycloud.ai
·
1d
Wild
parrots
exhibit age-dependent
conformity
when learning about novel food
🤖
AI
journals.plos.org
·
13h
On-Policy vs Off-Policy RL:
PPO
vs SAC on 5
Gymnasium
Tasks
🤖
AI
tildalice.io
·
4d
The Policy Picks the Policy
🧠
LLMs
noise2signal.bearblog.dev
·
2d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help