Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
471
posts in
8.1
ms
Reasoning
RL
in 2026: GRPO,
DPO
, RLVR,
Agentic
PO & Beyond
🤖
AI
turingpost.com
·
3d
3 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
TT-DAC-PS: Twin-Target Deterministic
Actor-Critic
with
Policy
Smoothing for Optimal Trade Execution
♟️
Game Theory
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution
Tracing Eval-Awareness Emergence Through
Training
of OLMo 3
🤖
AI
lesswrong.com
·
12h
12 hours ago
Actions for Tracing Eval-Awareness Emergence Through Training of OLMo 3
Researchers develop AI-powered railway control system for efficient urban
train
operation
🤖
AI
techxplore.com
·
9h
9 hours ago
Actions for Researchers develop AI-powered railway control system for efficient urban train operation
Q-Learning
(
Reinforcement
learning
): Bellman Equation,
Markov
Decision Processes, Q-Values, and…
♟️
Game Theory
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…
Reinforcement
Learning
and
Optimal
Control Book (RIP Dimitri Bertsekas)
♟️
Game Theory
Content type:
Academic
web.mit.edu
·
5d
5 days ago
·
Hacker News
Actions for Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)
Scale Robot
Reinforcement
Learning
with NVIDIA Isaac Lab on Amazon SageMaker AI
🤖
AI
Content type:
Blog
aws.amazon.com
·
1d
1 day ago
Actions for Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI
Less-relevant results
Semi-finalists confirmed in Secondary Schools Volleyball Competition
💻
Computer Science
cbc.bb
·
23h
23 hours ago
Actions for Semi-finalists confirmed in Secondary Schools Volleyball Competition
École secondaire Notre-Dame-du-Sault to hold graduation on June 24
💻
Computer Science
sootoday.com
·
6d
6 days ago
Actions for École secondaire Notre-Dame-du-Sault to hold graduation on June 24
Agents
Need Work Data: A Primer on RLWD, or
Reinforcement
Learning
on Work Data
🤖
AI
anjalishriva.com
·
1d
1 day ago
·
Hacker News
Actions for Agents Need Work Data: A Primer on RLWD, or Reinforcement Learning on Work Data
Edge AI enabled MIMO MC-CDMA for 6G
optimizing
spectrum and energy efficiency with SIC and
deep
reinforcement
learning
🤖
AI
Content type:
Academic
nature.com
·
22h
22 hours ago
Actions for Edge AI enabled MIMO MC-CDMA for 6G optimizing spectrum and energy efficiency with SIC and deep reinforcement learning
SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
🤖
AI
Content type:
Code
github.com
·
3d
3 days ago
·
r/opensource
Actions for SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
China women’s volleyball team finish Nations League leg on a high after opening defeat
♟️
Game Theory
Content type:
News
scmp.com
·
1d
1 day ago
·
r/SCMPauto
Actions for China women’s volleyball team finish Nations League leg on a high after opening defeat
Time-slip in AI sepsis models may inflate results, risking under- or overtreatment
🤖
AI
medicalxpress.com
·
5d
5 days ago
Actions for Time-slip in AI sepsis models may inflate results, risking under- or overtreatment
DQN
Tutorial -
RL
Summer School 2026
♟️
Game Theory
araffin.github.io
·
1d
1 day ago
Actions for DQN Tutorial - RL Summer School 2026
Good teachers don’t cheat
♟️
Game Theory
Content type:
Blog
jasonkena.github.io
·
6d
6 days ago
·
Hacker News
Actions for Good teachers don’t cheat
Mult-DPO
: Multinomial
Direct
Preference
Optimization for Recommender Systems
♟️
Game Theory
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems
2026 FIVB Volleyball Women's Nations League in Nanjing: Poland beats Czech Republic 3-0
♟️
Game Theory
ecns.cn
·
5d
5 days ago
Actions for 2026 FIVB Volleyball Women's Nations League in Nanjing: Poland beats Czech Republic 3-0
Photos: Syracuse Views Through the Decades
💻
Computer Science
Content type:
Academic
news.syr.edu
·
1d
1 day ago
Actions for Photos: Syracuse Views Through the Decades
Core Automation co-founder Jerry Tworek jokes that Nvidia's CUDA translates to miracles in Polish
🤖
AI
digg.com
·
6d
6 days ago
Actions for Core Automation co-founder Jerry Tworek jokes that Nvidia's CUDA translates to miracles in Polish
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help