Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
423
posts in
9.2
ms
Test Your Skills Against an AI Air Hockey Robot
🦿
Embodied AI
Content type:
News
hackster.io
·
5d
5 days ago
Actions for Test Your Skills Against an AI Air Hockey Robot
Flow-DPPO: Divergence Proximal
Policy
Optimization for Flow Matching Models
✨
Generative AI
Content type:
Academic
arxiv.org
·
15h
15 hours ago
Actions for Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models
I got so mad at poke(rogue)like that I trained a
RL
agent
to beat it for me
📊
LLM Evaluation
Content type:
Blog
blog.thiagolira.com.br
·
6d
6 days ago
·
Hacker News
Actions for I got so mad at poke(rogue)like that I trained a RL agent to beat it for me
2026 FIVB Volleyball Women's Nations League in Nanjing: Poland beats Czech Republic 3-0
📊
LLM Evaluation
ecns.cn
·
5d
5 days ago
Actions for 2026 FIVB Volleyball Women's Nations League in Nanjing: Poland beats Czech Republic 3-0
Deep
reinforcement
learning
for process design: Review and perspective
✨
Generative AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Deep reinforcement learning for process design: Review and perspective
Sasha Rush explains targeted
on-policy
self-distillation, a
reinforcement
learning
technique that corrects specific LLM rollout errors
📊
LLM Evaluation
digg.com
·
6d
6 days ago
Actions for Sasha Rush explains targeted on-policy self-distillation, a reinforcement learning technique that corrects specific LLM rollout errors
Model predictive task sampling for efficient and robust adaptation
⚙️
Prompt Engineering
Content type:
Academic
nature.com
·
1d
1 day ago
Actions for Model predictive task sampling for efficient and robust adaptation
SHAPO: Sharpness-Aware
Policy
Optimization for Safe Exploration
🔬
ML Research
Content type:
Academic
arxiv.org
·
15h
15 hours ago
Actions for SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
KJLdefeated/RL.cu
: RLVR training for LLM in CUDA/C++
🤖
AI
Content type:
Code
github.com
·
3d
3 days ago
·
Hacker News
Actions for KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
ARTA: Adaptive
Reinforcement-Learning-Based
Throttling
Agent
for RowHammer Vulnerabilities
🤖
AI Agents
Content type:
Academic
arxiv.org
·
15h
15 hours ago
Actions for ARTA: Adaptive Reinforcement-Learning-Based Throttling Agent for RowHammer Vulnerabilities
Nvidia Nemotron 3 Ultra
🤖
AI
research.nvidia.com
·
6d
6 days ago
·
Hacker News
Actions for Nvidia Nemotron 3 Ultra
Self-Paced Curriculum
Reinforcement
Learning
for Autonomous Superbike Racing in Simulation
🤖
AI Agents
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation
Bridging Multi-Vector and
Learned-Sparse
Retrieval, A Diagnostic Framework for Robust Semantic IDs, and More!
🧠
LLMs
Content type:
News
Content type:
Blog
recsys.substack.com
·
5d
5 days ago
·
Substack
Actions for Bridging Multi-Vector and Learned-Sparse Retrieval, A Diagnostic Framework for Robust Semantic IDs, and More!
Failure Modes of
Deep
Multi-Agent
RL
in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix
📊
LLM Evaluation
Content type:
Academic
arxiv.org
·
15h
15 hours ago
Actions for Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix
DeepSeek
fundraising 💰, Meta model delays ⌛ , Gemma 4 12B 🤖
🤖
AI
tldr.tech
·
6d
6 days ago
Actions for DeepSeek fundraising 💰, Meta model delays ⌛ , Gemma 4 12B 🤖
UNIQ: Conformal Calibration for Adaptive Conservatism in Offline
Reinforcement
Learning
🔬
ML Research
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning
Protest against ballot paper shortages enters 2nd day, demanding new election
💉
Prompt Injection
Content type:
News
koreatimes.co.kr
·
4d
4 days ago
·
r/news
Actions for Protest against ballot paper shortages enters 2nd day, demanding new election
Representation-Aware Advantage Estimation: Your
Reward
Model Provides More Than A Scalar Output
🤖
AI
Content type:
Academic
arxiv.org
·
15h
15 hours ago
Actions for Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output
Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using
Reinforcement
Learning
🦾
Motion Planning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning
Why Robotics Is a Pre-Paradigm Field
🦿
Embodied AI
Content type:
News
whattotelltherobot.com
·
3d
3 days ago
·
Hacker News
Actions for Why Robotics Is a Pre-Paradigm Field
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help