Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
馃幃 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
242
posts in
5.4
ms
Merging model-based control with
multi-agent
reinforcement
learning
for
multi-agent
cooperative teaming strategies
聽
馃
AI Agents
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies
Agents
Need Work Data: A Primer on RLWD, or
Reinforcement
Learning
on Work Data
聽
馃
AI Agents
anjalishriva.com
路
2d
2 days ago
路
Hacker News
Actions for Agents Need Work Data: A Primer on RLWD, or Reinforcement Learning on Work Data
Researchers trained an open source AI search
agent
, Harness-1, that outperforms GPT-5.4 on recalling relevant information
聽
馃幆
Fine-tuning
venturebeat.com
路
2d
2 days ago
路
Hacker News
Actions for Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information
Reinforcement
Learning
and Optimal Control Book (RIP Dimitri Bertsekas)
聽
馃搱
Optimization
聽
Content type:
Academic
web.mit.edu
路
6d
6 days ago
路
Hacker News
Actions for Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)
Less-relevant results
Propel: Breaking the Solver Bottleneck in Task-Generator
RL
聽
馃幆
Fine-tuning
vmax.ai
路
23h
23 hours ago
路
Hacker News
Actions for Propel: Breaking the Solver Bottleneck in Task-Generator RL
Introducing North Mini Code: Cohere鈥檚 First Model For Developers
聽
馃幆
Fine-tuning
聽
Content type:
Blog
huggingface.co
路
2d
2 days ago
路
Hacker News
Actions for Introducing North Mini Code: Cohere鈥檚 First Model For Developers
Memoirs of a
Learning
Machine: Autobiographical Self-Training and the Self-Training Gap
聽
馃З
Cognitive Science
zenodo.org
路
4d
4 days ago
路
Hacker News
Actions for Memoirs of a Learning Machine: Autobiographical Self-Training and the Self-Training Gap
Why LLMs (still) lack taste
聽
馃
LLM
beyondtheprior.com
路
2d
2 days ago
路
Hacker News
Actions for Why LLMs (still) lack taste
How to Train Your Goblin
聽
馃幆
Fine-tuning
goblins.mchen.workers.dev
路
4d
4 days ago
路
Hacker News
,
Hacker News
Actions for How to Train Your Goblin
Multi-agent
rendezvous in fluid flows via
reinforcement
learning
聽
馃
AI Agents
聽
Content type:
Academic
arxiv.org
路
17h
17 hours ago
Actions for Multi-agent rendezvous in fluid flows via reinforcement learning
How to Stop Shipping Low-Quality
RL
Environments (with Examples)
聽
馃幆
Fine-tuning
聽
Content type:
News
latent.space
路
6d
6 days ago
路
Hacker News
Actions for How to Stop Shipping Low-Quality RL Environments (with Examples)
Deterministic
Policy
Gradient
for
Learning
Equilibrium in Time-Inconsistent Control Problems
聽
馃搱
Optimization
聽
Content type:
Academic
arxiv.org
路
17h
17 hours ago
Actions for Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems
I got so mad at poke(rogue)like that I trained a
RL
agent
to beat it for me
聽
馃
AI Agents
thiagolira.blot.im
路
4d
4 days ago
路
Hacker News
Actions for I got so mad at poke(rogue)like that I trained a RL agent to beat it for me
AI-powered living business intelligence network
聽
馃
DuckDB
atlasforgex.com
路
1d
1 day ago
路
Hacker News
Actions for AI-powered living business intelligence network
Phi-Actor-Critic
: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria
聽
馃搱
Optimization
聽
Content type:
Academic
arxiv.org
路
17h
17 hours ago
Actions for Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria
Risk Has an Owner, and It's Not the AI
聽
馃
Automation
聽
Content type:
Blog
aaddrick.com
路
4d
4 days ago
路
Hacker News
Actions for Risk Has an Owner, and It's Not the AI
[NEW MODEL] SupraLabs just released Supra1.5-50M Base (Experimental)!
聽
馃敜
Tokenization
huggingface.co
路
9h
9 hours ago
路
r/LocalLLaMA
Actions for [NEW MODEL] SupraLabs just released Supra1.5-50M Base (Experimental)!
LLM Research Papers: The 2026 List (January to May)
聽
馃挰
Natural Language Processing
聽
Content type:
News
magazine.sebastianraschka.com
路
5d
5 days ago
路
Hacker News
Actions for LLM Research Papers: The 2026 List (January to May)
TT-DAC-PS: Twin-Target Deterministic
Actor-Critic
with
Policy
Smoothing for Optimal Trade Execution
聽
馃搱
Optimization
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution
A wild idea: Abstract reality using ontology
聽
馃
LLM
聽
Content type:
Discussion
news.ycombinator.com
路
5d
5 days ago
路
Hacker News
Actions for A wild idea: Abstract reality using ontology
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help