Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
386
posts in
65.1
ms
🎯
RLHF
arXiv
·
20h
20 hours ago
Bias-Controlled Primal-Dual Natural
Actor-Critic
: Optimal Rates for Constrained
Multi-Objective
Average-Reward RL
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Bias-Controlled Primal-Dual Natural Actor-Critic: Optimal Rates for Constrained Multi-Objective Average-Reward RL
🔬
AI Research
medium.com
·
6d
6 days ago
ICLR 2026 Test of Time: DDPG and the jump to continuous control
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for ICLR 2026 Test of Time: DDPG and the jump to continuous control
🤖
AI agent development
medium.com
·
3d
3 days ago
How I Reverse Engineered Snake Rattle Roll to Train an AI (Part 1)
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for How I Reverse Engineered Snake Rattle Roll to Train an AI (Part 1)
🎯
RLHF
ujangriswanto08.medium.com
·
1d
1 day ago
How
Q-Learning
is Changing Robotics and Autonomous Systems
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for How Q-Learning is Changing Robotics and Autonomous Systems
🏗️
AI Infrastructure
cnbeta.com.tw
·
3d
3 days ago
谷歌深化与联发科合作 开发升级版TPU押注AI智能体
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 谷歌深化与联发科合作 开发升级版TPU押注AI智能体
🎯
RLHF
Nature
·
5d
5 days ago
Attention modulates value normalization in human
reinforcement
learning
by shaping
reward
encoding
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Attention modulates value normalization in human reinforcement learning by shaping reward encoding
🎯
RLHF
grahamjroy.medium.com
·
6d
6 days ago
Q-Learning
—
Learning
to
Act
Without a Map
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Q-Learning — Learning to Act Without a Map
🎯
RLHF
arXiv
·
2d
2 days ago
Modularized
Reinforcement
Learning
on LLMs: From MDP Creation to Exploration and
Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Modularized Reinforcement Learning on LLMs: From MDP Creation to Exploration and Learning
🎯
RLHF
wire.insiderfinance.io
·
3d
3 days ago
Training a Trading Agent Using
Reinforcement
Learning
: Reality vs Theory
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Training a Trading Agent Using Reinforcement Learning: Reality vs Theory
🎯
RLHF
eLife
·
4d
4 days ago
Neural signatures of
model-based
and model-free
reinforcement
learning
across prefrontal cortex and striatum
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Neural signatures of model-based and model-free reinforcement learning across prefrontal cortex and striatum
🧠
LLM Tooling
daily.zhihu.com
·
6d
6 days ago
2026 年
RL
方向面经合集
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 2026 年 RL 方向面经合集
🛡️
AI Safety
medium.com
·
3d
3 days ago
Reward
hacking in
Reinforcement
learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Reward hacking in Reinforcement learning
🎯
RLHF
arXiv
·
1d
1 day ago
KLip-PPO
: A per-sample KL perspective on
PPO-Clip
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for KLip-PPO: A per-sample KL perspective on PPO-Clip
🤖
Anthropic Claude
rhp.bearblog.dev
·
4d
4 days ago
Mini-spire: a fast Slay the Spire
RL
environment in C++
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Mini-spire: a fast Slay the Spire RL environment in C++
🛡️
AI Safety
The Decoder
·
6d
6 days ago
OpenAI researchers show small doses of "beneficial trait" training make AI models broadly safer and harder to manipulate
Covers
Reinforcement learning towards broadly and persistently beneficial models
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for OpenAI researchers show small doses of "beneficial trait" training make AI models broadly safer and harder to manipulate
🏗️
AI Infrastructure
Stories by 郭明錤 (Ming-Chi Kuo) on Medium via medium.com
·
3d
3 days ago
Google and MediaTek Deepen TPU v9 Collaboration with Upgraded Triggerfish, Targeting AI Agents…
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Google and MediaTek Deepen TPU v9 Collaboration with Upgraded Triggerfish, Targeting AI Agents…
🤖
Agentic Engineering
IT之家
·
4d
4 days ago
消息称华为乾崑智驾 ADS 5 即将推送,鸿蒙智行旗舰车型优先搭载
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 消息称华为乾崑智驾 ADS 5 即将推送,鸿蒙智行旗舰车型优先搭载
🎯
RLHF
arXiv
·
2d
2 days ago
ReFPO: Reflow Regularization for Flow Matching
Policy
Gradients
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for ReFPO: Reflow Regularization for Flow Matching Policy Gradients
🧠
Context Engineering
arXiv
·
20h
20 hours ago
Compositional Behavioral Semantics for State Abstraction in
Reinforcement
Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Compositional Behavioral Semantics for State Abstraction in Reinforcement Learning
🏗️
AI Infrastructure
IT之家
·
3d
3 days ago
郭明錤:谷歌开发 TPU v9 芯片推理优化升级款,联发科接单
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 郭明錤:谷歌开发 TPU v9 芯片推理优化升级款,联发科接单
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report