Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
437
posts in
10.5
ms
Merging model-based control with
multi-agent
reinforcement
learning
for
multi-agent
cooperative teaming strategies
🤖
AI Agents
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies
How to Implement a Model-Free
RL
Algorithm: A Step-by-Step Guide
🤖
AI Agents
Content type:
Blog
ujangriswanto08.medium.com
·
18h
18 hours ago
Actions for How to Implement a Model-Free RL Algorithm: A Step-by-Step Guide
Researchers develop AI-powered railway control system for efficient urban train operation
🤖
Automation
techxplore.com
·
1d
1 day ago
Actions for Researchers develop AI-powered railway control system for efficient urban train operation
Agents
Need Work Data: A Primer on RLWD, or
Reinforcement
Learning
on Work Data
🤖
AI Agents
anjalishriva.com
·
2d
2 days ago
·
Hacker News
Actions for Agents Need Work Data: A Primer on RLWD, or Reinforcement Learning on Work Data
I Got Tired of Rebuilding My Retro
RL
Projects
🔥
PyTorch
Content type:
Blog
medium.com
·
14h
14 hours ago
Actions for I Got Tired of Rebuilding My Retro RL Projects
Q-Learning
(
Reinforcement
learning
): Bellman Equation,
Markov
Decision Processes, Q-Values, and…
📈
Optimization
Content type:
Blog
medium.com
·
3d
3 days ago
Actions for Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…
Scale Robot
Reinforcement
Learning
with NVIDIA Isaac Lab on Amazon SageMaker AI
🦾
Robotics
Content type:
Blog
aws.amazon.com
·
2d
2 days ago
Actions for Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI
Reasoning
RL
in 2026: GRPO, DPO, RLVR,
Agentic
PO
& Beyond
🎯
Fine-tuning
turingpost.com
·
4d
4 days ago
Actions for Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond
Less-relevant results
GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through
Reinforcement
Learning
🧠
LLMs
Content type:
Academic
biorxiv.org
·
5h
5 hours ago
Actions for GermRL: Alleviating The Germline Bias In Autoregressive Antibody Language Models Through Reinforcement Learning
Reinforcement-learning
signals support dynamic adaptive control during language switching
🤖
Data science
Content type:
Academic
nature.com
·
1d
1 day ago
Actions for Reinforcement-learning signals support dynamic adaptive control during language switching
SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
🪨
Obsidian
Content type:
Code
github.com
·
4d
4 days ago
·
r/opensource
Actions for SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.
Some Interesting Papers on RLVR
🎯
Fine-tuning
lesswrong.com
·
2d
2 days ago
Actions for Some Interesting Papers on RLVR
Mi50 32GB / GFX906 - vLLM Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit
🧠
LLMs
huggingface.co
·
57m
57 minutes ago
·
r/LocalLLaMA
Actions for Mi50 32GB / GFX906 - vLLM Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit
Reinforcement
Learning
and Optimal Control Book (RIP Dimitri Bertsekas)
📈
Optimization
Content type:
Academic
web.mit.edu
·
6d
6 days ago
·
Hacker News
Actions for Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)
Deep
Reinforcement
Learning
for Adaptive Power Allocation in ISAC Systems with Mobile Target
📈
Optimization
Content type:
Academic
arxiv.org
·
19h
19 hours ago
Actions for Deep Reinforcement Learning for Adaptive Power Allocation in ISAC Systems with Mobile Target
Cohere open-sources a coding
agent
that runs on a single H100
🎯
Fine-tuning
venturebeat.com
·
2d
2 days ago
Actions for Cohere open-sources a coding agent that runs on a single H100
Time-slip in AI sepsis models may inflate results, risking under- or overtreatment
🤖
Data science
medicalxpress.com
·
6d
6 days ago
Actions for Time-slip in AI sepsis models may inflate results, risking under- or overtreatment
DQN
Tutorial -
RL
Summer School 2026
🧠
LLM Inference
araffin.github.io
·
2d
2 days ago
Actions for DQN Tutorial - RL Summer School 2026
AI
Agent
Mastery & Coaching
🔢
Embeddings
ruv.io
·
3d
3 days ago
Actions for AI Agent Mastery & Coaching
China women’s volleyball team finish Nations League leg on a high after opening defeat
📈
Economics
Content type:
News
scmp.com
·
2d
2 days ago
·
r/SCMPauto
Actions for China women’s volleyball team finish Nations League leg on a high after opening defeat
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help