Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
强化学习
🎮 强化学习
智能体, 奖励函数, Q学习, 策略优化
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
407
posts in
27.0
ms
🤖
机器学习
arXiv
·
3d
3 days ago
KLip-PPO
: A per-sample KL perspective on
PPO-Clip
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for KLip-PPO: A per-sample KL perspective on PPO-Clip
👁️
计算机视觉
ujangriswanto08.medium.com
·
1d
1 day ago
A Comprehensive Guide to Implementing
Policy
Gradient
for
Reinforcement
Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for A Comprehensive Guide to Implementing Policy Gradient for Reinforcement Learning
💬
NLP
utkarshmanojpandey.blogspot.com
·
2h
2 hours ago
The Rise of AI
Agents
: How They’
re
Changing Work in 2026
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Rise of AI Agents: How They’re Changing Work in 2026
👁️
计算机视觉
medium.com
·
13h
13 hours ago
Reinforcement
Learning
— Basics
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Reinforcement Learning — Basics
🤖
机器学习
towardsai.com
·
2d
2 days ago
Understanding
Reinforcement
Learning
— A Primer
Covers
Beautiful Free Images & Pictures | Unsplash
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Understanding Reinforcement Learning — A Primer
🧠
认知科学
Semiconductor Engineering
·
5d
5 days ago
Event-Driven RL Targets Long-Horizon Fab Control
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Event-Driven RL Targets Long-Horizon Fab Control
👁️
计算机视觉
medium.com
·
13h
13 hours ago
The Moment AI Stops Predicting and Starts Choosing
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Moment AI Stops Predicting and Starts Choosing
👁️
计算机视觉
TNW | Artificial-Intelligence
·
1d
1 day ago
Patronus AI raises $50M to stress-test AI
agents
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Patronus AI raises $50M to stress-test AI agents
🕸️
神经网络
ofir.io
·
7h
7 hours ago
How to Start
Learning
Deep
Learning
Covers
3 stories
See all stories this covers
including
Introduction to Pytorch
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for How to Start Learning Deep Learning
👁️
计算机视觉
Nature
·
5d
5 days ago
Reinforcement
learning-assisted
distributionally robust energy management for multi-microgrid networks
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Reinforcement learning-assisted distributionally robust energy management for multi-microgrid networks
👁️
计算机视觉
VentureBeat
·
2d
2 days ago
Alibaba's model never trained as an
agent
— and improved
agent
performance across seven benchmarks
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Alibaba's model never trained as an agent — and improved agent performance across seven benchmarks
👁️
计算机视觉
grahamjroy.medium.com
·
1d
1 day ago
Deep
Q-Networks — When the Q-Table Won’t Fit
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Deep Q-Networks — When the Q-Table Won’t Fit
💬
NLP
Bloomberg
·
5d
5 days ago
Tech Disruptors: Invisible Technologies on RLHF and LLM Training
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Tech Disruptors: Invisible Technologies on RLHF and LLM Training
💬
NLP
Hugging Face
·
3d
3 days ago
Qwen-AgentWorld-35B-A3B
: a 3B-active MoE trained to simulate MCP, terminal, SWE, Android, web and OS environments
Covers
2 stories
See all stories this covers
including
vllm-project/vllm
Covered by
4 sources
See all sources covering this story
including
vettedconsumer.com
,
GitHub
Discussed on
r/LocalLLaMA
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Qwen-AgentWorld-35B-A3B: a 3B-active MoE trained to simulate MCP, terminal, SWE, Android, web and OS environments
🧠
认知科学
generalintuition.com
·
1d
1 day ago
The frontier research lab dedicated to games—and the real world.
Covered by
5 sources
See all sources covering this story
including
RuntimeWire
,
Tech Funding News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The frontier research lab dedicated to games—and the real world.
🔄
系统思维
journals.plos.org
·
1d
1 day ago
Single-threshold–guided adaptive cancer therapy with partial-cycle treatment: A mechanistic and
reinforcement
learning
analysis
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Single-threshold–guided adaptive cancer therapy with partial-cycle treatment: A mechanistic and reinforcement learning analysis
💬
NLP
cmswire.com
·
2d
2 days ago
MoEngage Acquires Aampe to Put a Dedicated AI
Agent
Behind Every Customer
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for MoEngage Acquires Aampe to Put a Dedicated AI Agent Behind Every Customer
👁️
计算机视觉
medium.com
·
5d
5 days ago
CODE #3: EMERGENT DECAYING EPSILON-GREEDY
Q-LEARNING
(PYTHON)
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for CODE #3: EMERGENT DECAYING EPSILON-GREEDY Q-LEARNING (PYTHON)
👁️
计算机视觉
TechCrunch
·
1d
1 day ago
Patronus AI lands $50M to build ‘digital worlds’ that stress-test AI
agents
Covered by
SiliconANGLE
,
AI资讯速览
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Patronus AI lands $50M to build ‘digital worlds’ that stress-test AI agents
👁️
计算机视觉
www.beam.cloud (sitemap)
·
3d
3 days ago
Best Sandbox Providers for
Reinforcement
Learning
in 2026
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Best Sandbox Providers for Reinforcement Learning in 2026
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report