Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎮 Reinforcement Learning
RL, reward functions, policy gradient, agents, simulation
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
186710
posts in
23.5
ms
Building Better Software with AI Agents: Why
Fundamentals
Still
Matter
🧠
AI Agents
youtu.be
·
3d
·
DEV
Show HN: A live
autonomous
economic network for AI agents
🧠
AI Agents
ainetwork-global.github.io
·
3d
·
Hacker News
On-Policy vs Off-Policy RL:
PPO
vs SAC on 5
Gymnasium
Tasks
🕸️
Distributed Systems
tildalice.io
·
4d
Software Agents: The management challenge
🧠
AI Agents
hypecycles.com
·
6d
Lyapunov-Guided
Self-Alignment: Test-Time Adaptation for
Offline
Safe Reinforcement Learning
🕸️
Distributed Systems
arxiv.org
·
20h
FutureWorld
: A Live Environment for Training Predictive Agents with Real-World
Outcome
Rewards
🧠
AI Agents
arxiv.org
·
20h
A new GitHub
repo
to detect reward hacking in
RL
models
⚙️
MLOps
github.com
·
4d
·
Hacker News
Rule-based High-Level Coaching for
Goal-Conditioned
Reinforcement Learning in Search-and-Rescue
UAV
Missions Under Limited-Simulation Training
🚗
Autonomous Systems
arxiv.org
·
20h
Uncertainty-Aware Reward
Discounting
for
Mitigating
Reward Hacking
🕸️
Distributed Systems
arxiv.org
·
20h
Policy
Improvement
Reinforcement
Learning
🧠
AI Agents
arxiv.org
·
1d
A Survey of Multi-Agent Deep
Reinforcement
Learning with Graph Neural Network-Based
Communication
🧠
AI Agents
arxiv.org
·
20h
Addressing Performance
Saturation
for LLM RL via
Precise
Entropy Curve Control
🧠
LLMs
arxiv.org
·
20h
DORA
: A Scalable
Asynchronous
Reinforcement Learning System for Language Model Training
🧠
LLMs
arxiv.org
·
20h
K-Score:
Kalman
Filter as a
Principled
Alternative to Reward Normalization in Reinforcement Learning
⚙️
MLOps
arxiv.org
·
2d
On the Complexity of Robust
Markov
Decision Processes and
Bisimulation
Metrics
🕸️
Distributed Systems
arxiv.org
·
20h
RL
Token:
Bootstrapping
Online
RL
with Vision-Language-Action Models
🧠
LLMs
arxiv.org
·
2d
Bian
Que: An Agentic Framework with Flexible Skill
Arrangement
for Online System Operations
🕸️
Distributed Systems
arxiv.org
·
20h
DPEPO
:
Diverse
Parallel Exploration Policy Optimization for LLM-based Agents
🧠
AI Agents
arxiv.org
·
2d
I Would If I Could: Reasoning about
Dynamics
of
Actions
in Multi-Agent Systems
🧠
AI Agents
arxiv.org
·
20h
Preserving
Disagreement
: Architectural
Heterogeneity
and Coherence Validation in Multi-Agent Policy Simulation
🧠
AI Agents
arxiv.org
·
20h
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help