Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
415
posts in
6.8
ms
Test-Time
Gradient
Guidance of Flow
Policies
in
Reinforcement
Learning
📊
LLM Evaluation
Content type:
Academic
arxiv.org
·
8h
8 hours ago
Actions for Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning
Towards Shutdownable
Agents
: Generalizing Stochastic Choice in
RL
Agents
and LLMs
🤖
AI
lesswrong.com
·
6d
6 days ago
Actions for Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs
🥇Top AI Papers of the Week
🤖
AI Agents
Content type:
News
nlp.elvissaravia.com
·
2d
2 days ago
Actions for 🥇Top AI Papers of the Week
OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training
agents
.
🤖
AI
Content type:
Blog
huggingface.co
·
2d
2 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for OpenEnv is now owned by HF, Torch, Prime Intellect, Unsloth, Modal, Mercor, and more! Use it for training agents.
Cohere open-sources a coding
agent
that runs on a single H100
🤖
AI Agents
venturebeat.com
·
14h
14 hours ago
Actions for Cohere open-sources a coding agent that runs on a single H100
How to Train Your Goblin
🤖
AI
goblins.mchen.workers.dev
·
3d
3 days ago
·
Hacker News
,
Hacker News
Actions for How to Train Your Goblin
Event-Driven
Reinforcement
Learning
Enables Long-Horizon Control in Semiconductor Fabrication
🌀
Procedural Generation
Content type:
Academic
arxiv.org
·
8h
8 hours ago
Actions for Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication
I got so mad at poke(rogue)like that I trained a
RL
agent
to beat it for me
📊
LLM Evaluation
thiagolira.blot.im
·
2d
2 days ago
·
Hacker News
Actions for I got so mad at poke(rogue)like that I trained a RL agent to beat it for me
Time-slip in AI sepsis models may inflate results, risking under- or overtreatment
📚
CS Research
medicalxpress.com
·
4d
4 days ago
Actions for Time-slip in AI sepsis models may inflate results, risking under- or overtreatment
Geometry-Aware
Reinforcement
Learning
for 2D Irregular Nesting
🦾
Motion Planning
Content type:
Academic
arxiv.org
·
8h
8 hours ago
Actions for Geometry-Aware Reinforcement Learning for 2D Irregular Nesting
A
Functional
Taxonomy of World Models
🤖
AI Agents
a16z.news
·
6d
6 days ago
Actions for A Functional Taxonomy of World Models
China women’s volleyball team finish Nations League leg on a high after opening defeat
👁️
Computer Vision
Content type:
News
scmp.com
·
1d
1 day ago
·
r/SCMPauto
Actions for China women’s volleyball team finish Nations League leg on a high after opening defeat
NVIDIA Enables the Next Era Of Physical AI Research With
Agent
Skills For Autonomous Vehicles, Robotics And Vision AI
🦿
Embodied AI
Content type:
Blog
blogs.nvidia.com
·
6d
6 days ago
Actions for NVIDIA Enables the Next Era Of Physical AI Research With Agent Skills For Autonomous Vehicles, Robotics And Vision AI
3SPO: State-Score-Supervised
Policy
Optimization for LLM
Agents
🤖
AI Agents
Content type:
Academic
arxiv.org
·
8h
8 hours ago
Actions for 3SPO: State-Score-Supervised Policy Optimization for LLM Agents
Memoirs of a
Learning
Machine: Autobiographical Self-Training and the Self-Training Gap
📚
CS Research
zenodo.org
·
3d
3 days ago
·
Hacker News
Actions for Memoirs of a Learning Machine: Autobiographical Self-Training and the Self-Training Gap
Model predictive task sampling for efficient and robust adaptation
⚙️
Prompt Engineering
Content type:
Academic
nature.com
·
1d
1 day ago
Actions for Model predictive task sampling for efficient and robust adaptation
Dmsh: A
Multi-Agent
Reinforcement
Learning
Framework for All-Quad Mesh Generation
🤖
AI Agents
Content type:
Academic
arxiv.org
·
8h
8 hours ago
Actions for Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation
École secondaire Notre-Dame-du-Sault to hold graduation on June 24
🧠
LLMs
sootoday.com
·
5d
5 days ago
Actions for École secondaire Notre-Dame-du-Sault to hold graduation on June 24
When
RL
Fails after SFT: Rejuvenating Model Plasticity for Robust
SFT-to-RL
Handoff
🧠
LLMs
Content type:
Academic
arxiv.org
·
8h
8 hours ago
Actions for When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff
The Exploit Always Wins
🔬
ML Research
Content type:
Blog
abhishek-shankar.com
·
4d
4 days ago
Actions for The Exploit Always Wins
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help