Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
馃幃 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
243
posts in
13.0
ms
Beyond Dexterity: Why Contact May Define the Next Era of Robotics
聽
馃
Robotics
聽
Content type:
Video
聽
Content type:
News
spectrum.ieee.org
路
2d
2 days ago
路
Hacker News
Actions for Beyond Dexterity: Why Contact May Define the Next Era of Robotics
AI model predicts building fire spread, redirecting evacuees to safer exits in real time
聽
馃
AI Agents
techxplore.com
路
6d
6 days ago
路
Hacker News
Actions for AI model predicts building fire spread, redirecting evacuees to safer exits in real time
Improving Generalization and Data Efficiency with Diffusion in Offline
Multi-agent
RL
聽
馃敩
Deep Learning
聽
Content type:
Academic
arxiv.org
路
16h
16 hours ago
Actions for Improving Generalization and Data Efficiency with Diffusion in Offline Multi-agent RL
Agentic
RL
: Token-In, Token-Out Done Right
聽
馃敜
Tokenization
qgallouedec-tito.hf.space
路
2d
2 days ago
路
Hacker News
Actions for Agentic RL: Token-In, Token-Out Done Right
Why Robotics Is a Pre-Paradigm Field
聽
馃
Machine Learning
聽
Content type:
News
whattotelltherobot.com
路
5d
5 days ago
路
Hacker News
Actions for Why Robotics Is a Pre-Paradigm Field
Reinforcement
Learning
Disrupts
Gradient-Based
Adversarial Optimization
聽
馃敟
PyTorch
聽
Content type:
Academic
arxiv.org
路
16h
16 hours ago
Actions for Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization
CCKS: Consensus-based Communication and Knowledge Sharing
聽
馃
Knowledge Management
聽
Content type:
Academic
arxiv.org
路
16h
16 hours ago
Actions for CCKS: Consensus-based Communication and Knowledge Sharing
Stack Overflow didn't just help AI
learn
to code
聽
馃
LLM
zozo123.github.io
路
4d
4 days ago
路
Hacker News
Actions for Stack Overflow didn't just help AI learn to code
Fast and Highly Expressive
Policy
Learning
for Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
聽
馃殌
Bootstrapping
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training
agents
.
聽
馃
Machine Learning
聽
Content type:
Blog
huggingface.co
路
3d
3 days ago
路
Hacker News
,
r/LocalLLaMA
Actions for OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.
Apple's New AI Models Contain 'None' of Google's Gemini Assistant
聽
馃
Obsidian
聽
Content type:
News
macrumors.com
路
2d
2 days ago
路
Hacker News
Actions for Apple's New AI Models Contain 'None' of Google's Gemini Assistant
Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep
Reinforcement
Learning
聽
馃敟
PyTorch
聽
Content type:
Academic
arxiv.org
路
16h
16 hours ago
Actions for Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep Reinforcement Learning
Geometrically Averaged Hard Target Updates for Linear
Q-Learning
聽
馃搱
Optimization
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Geometrically Averaged Hard Target Updates for Linear Q-Learning
Inside soccer鈥檚 data renaissance
聽
馃
Data science
聽
Content type:
News
technologyreview.com
路
10h
10 hours ago
路
Hacker News
Actions for Inside soccer鈥檚 data renaissance
Dmsh: A
Multi-Agent
Reinforcement
Learning
Framework for All-Quad Mesh Generation
聽
馃搱
Optimization
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation
Vibe Diaries: Training Nanochat
聽
馃敜
Tokenization
vibediary.dev
路
3d
3 days ago
路
Hacker News
Actions for Vibe Diaries: Training Nanochat
INFRAMIND: Infrastructure-Aware
Multi-Agent
Orchestration
聽
馃
LLM Inference
聽
Content type:
Academic
arxiv.org
路
16h
16 hours ago
Actions for INFRAMIND: Infrastructure-Aware Multi-Agent Orchestration
Geometry-Aware
Reinforcement
Learning
for 2D Irregular Nesting
聽
馃敟
PyTorch
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Geometry-Aware Reinforcement Learning for 2D Irregular Nesting
gaelazzo/python_chess: Chess trainer
聽
馃幆
Fine-tuning
聽
Content type:
Code
github.com
路
2d
2 days ago
路
Hacker News
Actions for gaelazzo/python_chess: Chess trainer
IAPO: Input Attribution-Aware
Policy
Optimization for Tool Use in Small Multimodal
Agents
聽
馃
AI Agents
聽
Content type:
Academic
arxiv.org
路
16h
16 hours ago
Actions for IAPO: Input Attribution-Aware Policy Optimization for Tool Use in Small Multimodal Agents
« Page 1
路
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help