Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
馃幃 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
186585
posts in
17.2
ms
Software Agents: The management challenge
聽
馃
AI Agents
hypecycles.com
路
6d
Which one is more important: more
parameters
or more
computation
? (2021)
聽
馃搻
ML Theory
parl.ai
路
6d
路
Hacker News
Rule-based High-Level Coaching for
Goal-Conditioned
Reinforcement Learning in Search-and-Rescue
UAV
Missions Under Limited-Simulation Training
聽
馃
AI Agents
arxiv.org
路
21h
Reddit as a Reinforcement Learning
Gym
for
Persuasion
聽
馃
Machine Learning
thediff.co
路
6d
Uncertainty-Aware Reward
Discounting
for
Mitigating
Reward Hacking
聽
鈾燂笍
Game Theory
arxiv.org
路
21h
Fail safe(r) at alignment by
channeling
reward-hacking into a "
spillway
" motivation
聽
鈾燂笍
Game Theory
lesswrong.com
路
3d
K-Score:
Kalman
Filter as a
Principled
Alternative to Reward Normalization in Reinforcement Learning
聽
馃搻
ML Theory
arxiv.org
路
2d
On the Complexity of Robust
Markov
Decision Processes and
Bisimulation
Metrics
聽
鈾燂笍
Game Theory
arxiv.org
路
21h
Deep Policy Iteration for High-Dimensional Mean-Field Games with
Regenerative
Reformulation
聽
馃搻
ML Theory
arxiv.org
路
21h
Uncertainty-Aware Predictive Safety
Filters
for
Probabilistic
Neural Network Dynamics
聽
馃搻
ML Theory
arxiv.org
路
21h
When Errors Can Be
Beneficial
: A
Categorization
of Imperfect Rewards for Policy Gradient
聽
馃
AI Agents
arxiv.org
路
1d
reward-lens: A
Mechanistic
Interpretability
Library for Reward Models
聽
馃搻
ML Theory
arxiv.org
路
21h
Addressing Performance
Saturation
for LLM RL via
Precise
Entropy Curve Control
聽
馃挰
LLMs
arxiv.org
路
21h
A Reward-Free
Viewpoint
on
Multi-Objective
Reinforcement Learning
聽
馃
AI Agents
arxiv.org
路
2d
Application of Deep Reinforcement Learning to Event-Triggered Control for
Networked
Artificial
Pancreas
Systems
聽
鈾燂笍
Game Theory
arxiv.org
路
21h
Entropy
Centroids
as
Intrinsic
Rewards for Test-Time Scaling
聽
馃搻
ML Theory
arxiv.org
路
21h
CAPSULE:
Control-Theoretic
Action
Perturbations
for Safe Uncertainty-Aware Reinforcement Learning
聽
鈾燂笍
Game Theory
arxiv.org
路
2d
DORA
: A Scalable
Asynchronous
Reinforcement Learning System for Language Model Training
聽
馃挰
LLMs
arxiv.org
路
21h
RL
Token:
Bootstrapping
Online
RL
with Vision-Language-Action Models
聽
馃挰
LLMs
arxiv.org
路
2d
Co-Learning
Port-Hamiltonian
Systems and Optimal
Energy-Shaping
Control
聽
馃搻
ML Theory
arxiv.org
路
21h
« Page 1
路
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help