Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎮 Reinforcement Learning
RLHF, Reward Models, Policy, Agents
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
181592
posts in
39.1
ms
Dynamical
Priors
as a Training Objective in Reinforcement Learning
🎲
Stochastic Processes
arxiv.org
·
1d
Autoharness
:
Self-Improving
Agents
🕵️
LLM Agents
neosigma.ai
·
5d
·
Hacker News
Rethinking
Personalization
for the Agent Era, Semantic Recall for Vector Search, and More!
🗄️
Vector Databases
recsys.substack.com
·
16h
·
Substack
Understanding the Most
Viral
Chart
in Artificial Intelligence
💡
AI Reasoning
bloomberg.com
·
1h
TUD
Lecture
on RL #2
💡
AI Reasoning
sebiwette.de
·
2d
Reddit as a Reinforcement Learning
Gym
for
Persuasion
🧠
Machine Learning
thediff.co
·
19h
Training Isn’t
Enough
: Reasoning Models and LLMs Need
Reinforcement
Learning
💡
AI Reasoning
hpcwire.com
·
4d
Which one is more important: more
parameters
or more
computation
? (2021)
💡
AI Reasoning
parl.ai
·
17h
·
Hacker News
Noise-resilient
quantum reinforcement learning
🎯
Kalman Filter
link.aps.org
·
2d
Are you paying an AI ‘
swarm
tax’? Why single agents
often
beat complex systems
🕵️
LLM Agents
oodaloop.com
·
18h
Checkmate
! Dominate the Competition by Learning Game Theory with
Wolfram
Language—
Wolfram
Blog
💡
AI Reasoning
blog.wolfram.com
·
2d
ILC-q
optimized
hierarchical reinforcement learning for autonomous vehicle path planning
📐
Optimization Theory
sciencedirect.com
·
5d
VincenzoManto/Doxa
: A YAML-driven multi-agent simulation platform for economic and social systems. It combines LLM-backed agents, market microstructure, relation graphs, and world events behind a FastAPI API and a React client.
🕵️
LLM Agents
github.com
·
20h
·
Hacker News
Proximal
Policy Optimization with Clojure and
PyTorch
🤖
AI
clojurecivitas.org
·
2d
·
Hacker News
Protecting
Cognitive
Integrity
: Our internal AI use policy (V1)
💡
AI Reasoning
lesswrong.com
·
18h
Smarter Doesn’t Mean
Safer
: A Real
RLHF
Experiment on LLM Behavior
💬
LLMs
medium.com
·
4d
Accelerate RL
rollouts
by up to 50% with distribution-aware
speculative
decoding
💬
LLMs
together.ai
·
1d
AI for Good
🤖
AI
airmail.news
·
10h
Agent-Guided Ranking Policy Improvement for
Peptide
Drug Candidate
Prioritization
🤖
AI
biorxiv.org
·
2d
Pact:
Trustworthy
Coordination for Multi-Agentic
Ecosystems
🕵️
LLM Agents
basis.ai
·
17h
·
Hacker News
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help