Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
馃幃 Reinforcement Learning
RL, Agents, Policy Optimization, Reward Functions
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
157561
posts in
7.5
ms
Reinforcement
Learning for LLMs
mesuvash.github.io
路
4h
路
Discuss:
Hacker News
馃挰
LLM
Balancing
Multiple
Objectives
in Urban Traffic Control with Reinforcement Learning from AI Feedback
arxiv.org
路
22h
馃
AI
Fuz-RL
: A Fuzzy-Guided Robust Framework for Safe Reinforcement Learning under Uncertainty
arxiv.org
路
22h
馃
AI
https://
opentyphoon.ai/blog/en/the-current-landscape-of-reasoning-model-development-91c552b3622b
opentyphoon.ai
路
5h
馃挰
LLM
Sutton
&
Barto
, Ch. 08: Planning & Learning with Tabular Methods (Personal Notes)
chizkidd.github.io
路
9h
路
Discuss:
Hacker News
馃
AI
(1) Navigating the Safety-Capability Spectrum when Teaching Agents w/ Feedback -
Prithviraj
Ammanabrolu
youtube.com
路
2h
馃挰
LLM
Optimal
Coding Agent
seanpedersen.github.io
路
3h
馃挰
LLM
Updating
Eagleson
's Law in the age of Agentic AI
tildes.net
路
5h
馃
AI
The Humanoid Robot
Generalization
Problem Has a New
Blueprint
hackernoon.com
路
3h
馃敟
PyTorch
From
RLHF
to Community: The New Path for AI Agent Training
dev.to
路
3h
路
Discuss:
DEV
馃敟
PyTorch
Autonomous
AI agents that make money for their
keeper
quoroom.ai
路
10h
路
Discuss:
Hacker News
馃
AI
Adaptation and
Self-Organizing
Systems (
nlin.AO
)
papers.cool
路
1d
馃
AI
Breaking through safety performance
stagnation
in autonomous vehicles with
dense
learning
nature.com
路
22h
馃挰
LLM
HAL
AI Agent
Reliability
Tracker
hal.cs.princeton.edu
路
6h
馃
AI
AI That
Lives
on the
Wire
hackster.io
路
9h
馃
AI
Agents
github.github.com
路
9h
馃
AI
Training Agents to Self-Report
Misbehavior
lesswrong.com
路
9h
馃
AI
A Framework for
Consequence-Driven
Worldbuilding
in Games
tagnyx.itch.io
路
9h
路
Discuss:
DEV
馃實
Climate Fiction
Towards Efficient Reward Service for
RLVR
with Request-Level Flexibility and Batch-Level
Constraint
usenix.org
路
1d
馃挰
LLM
Building AI Agent Memory Architecture: A
Practical
Guide to State Management in
Autonomous
Systems
dev.to
路
21h
路
Discuss:
DEV
馃
AI
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help