Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
386
posts in
33.1
ms
🤖
AI/ML
arXiv
·
2d
2 days ago
Backpropagating Through Simulation: Analytic
Policy
Gradients
for Sample and
Learning
Efficient Differentiable Continuous Control
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Backpropagating Through Simulation: Analytic Policy Gradients for Sample and Learning Efficient Differentiable Continuous Control
🎯
RLHF
ujangriswanto08.medium.com
·
20h
20 hours ago
The Beginner’s Guide to
Policy
Gradient
and
Reinforcement
Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Beginner’s Guide to Policy Gradient and Reinforcement Learning
🎯
RLHF
fareedkhan-dev.github.io
·
4d
4 days ago
Train LLM from Scratch
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Train LLM from Scratch
🎯
RLHF
grahamjroy.medium.com
·
5h
5 hours ago
Deep Q-Networks — When the Q-Table Won’t Fit
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Deep Q-Networks — When the Q-Table Won’t Fit
🎯
RLHF
www.beam.cloud (sitemap)
·
2d
2 days ago
Best Sandbox Providers for
Reinforcement
Learning
in 2026
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Best Sandbox Providers for Reinforcement Learning in 2026
🧠
LLM Research
Bloomberg
·
3d
3 days ago
Tech Disruptors: Invisible Technologies on
RLHF
and LLM Training
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Tech Disruptors: Invisible Technologies on RLHF and LLM Training
🤖
人工智能
medium.com
·
18h
18 hours ago
Gollum’s
Reinforcement
Learning
Loop: How a Broken
Reward
Function Created the Ring’s Most Tragic…
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Gollum’s Reinforcement Learning Loop: How a Broken Reward Function Created the Ring’s Most Tragic…
🎯
RLHF
wire.insiderfinance.io
·
2d
2 days ago
How AI
Learns
to Trade Through
Reward
Signals (And Why It Often Fails)
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for How AI Learns to Trade Through Reward Signals (And Why It Often Fails)
🤖
人工智能
IT之家
·
17h
17 hours ago
上汽奥迪 E5 Sportback 获推 AUDI OS 1.3.0:旁车加塞碰撞安全性能提升 5 倍
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 上汽奥迪 E5 Sportback 获推 AUDI OS 1.3.0:旁车加塞碰撞安全性能提升 5 倍
🎯
RLHF
Nature
·
4d
4 days ago
Reinforcement
learning-assisted
distributionally robust energy management for
multi-microgrid
networks
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Reinforcement learning-assisted distributionally robust energy management for multi-microgrid networks
🤖
人工智能
daily.zhihu.com
·
1h
1 hour ago
很多人说高三是自己的智力巅峰、知识储备量巅峰时期,是这样吗?这种说法有科学依据吗?
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 很多人说高三是自己的智力巅峰、知识储备量巅峰时期,是这样吗?这种说法有科学依据吗?
🤝
AI-Assisted Coding
Hackster.io
·
14h
14 hours ago
Isaac Lab Example:
Dual-Arm
Nero Reach Training
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Isaac Lab Example: Dual-Arm Nero Reach Training
⚡
LLM Optimization
medium.com
·
3d
3 days ago
CODE #3: EMERGENT DECAYING EPSILON-GREEDY
Q-LEARNING
(PYTHON)
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for CODE #3: EMERGENT DECAYING EPSILON-GREEDY Q-LEARNING (PYTHON)
🤖
AI Development
The Hollywood Reporter
·
1d
1 day ago
Hollywood Workers Are Training AI Models as Job Prospects Grow Slim
Covers
2 stories
See all stories this covers
including
I Work in Hollywood. Everyone Who Used to Make TV Is Now Secretly Training AI
Covered by
Digital Trends
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Hollywood Workers Are Training AI Models as Job Prospects Grow Slim
🔎
AI Interpretability
Tech Xplore
·
12h
12 hours ago
AI-driven race strategy could give Formula One teams competitive advantage
Covers
2 stories
See all stories this covers
including
Lisa Lock - Science X
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for AI-driven race strategy could give Formula One teams competitive advantage
🎯
RLHF
pure.mpg.de
·
3d
3 days ago
A longitudinal analysis of
reinforcement
learning
in early childhood
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for A longitudinal analysis of reinforcement learning in early childhood
🤖
ai 应用
kottke.org
·
2d
2 days ago
Room Tone
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Room Tone
🎯
RLHF
ujangriswanto08.medium.com
·
2d
2 days ago
Cracking the
Q-Learning
Code: Step-by-Step Implementation Guide
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Cracking the Q-Learning Code: Step-by-Step Implementation Guide
🎯
AI Reliability
Semiconductor Engineering
·
3d
3 days ago
Event-Driven
RL
Targets Long-Horizon Fab Control
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Event-Driven RL Targets Long-Horizon Fab Control
⚙️
LLM Fine-tuning
mlx-lora-studio.netlify.app
·
6d
6 days ago
MLX LoRA Studio — Fine-tune LLMs on your Mac
Covers
ml-explore/mlx
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for MLX LoRA Studio — Fine-tune LLMs on your Mac
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report