Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
馃幃 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
437
posts in
9.5
ms
Memoirs of a
Learning
Machine: Autobiographical Self-Training and the Self-Training Gap
聽
馃З
Cognitive Science
zenodo.org
路
5d
5 days ago
路
Hacker News
Actions for Memoirs of a Learning Machine: Autobiographical Self-Training and the Self-Training Gap
Multi-agent
rendezvous in fluid flows via
reinforcement
learning
聽
馃
AI Agents
聽
Content type:
Academic
arxiv.org
路
20h
20 hours ago
Actions for Multi-agent rendezvous in fluid flows via reinforcement learning
Social intelligence Arises Between Minds
聽
馃З
Cognitive Science
psychologytoday.com
路
4d
4 days ago
Actions for Social intelligence Arises Between Minds
Less-relevant results
Semi-finalists confirmed in Secondary Schools Volleyball Competition
聽
馃
LLM
cbc.bb
路
2d
2 days ago
Actions for Semi-finalists confirmed in Secondary Schools Volleyball Competition
Mi50 32GB / GFX906 - vLLM Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit
聽
馃
LLMs
huggingface.co
路
2h
2 hours ago
路
r/LocalLLaMA
Actions for Mi50 32GB / GFX906 - vLLM Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit
Hrithik Roshan Signs With Anonymous Content
聽
馃敜
Tokenization
聽
Content type:
News
deadline.com
路
1d
1 day ago
Actions for Hrithik Roshan Signs With Anonymous Content
2026 FIVB Volleyball Women's Nations League in Nanjing: Poland beats Czech Republic 3-0
聽
馃敟
PyTorch
ecns.cn
路
6d
6 days ago
Actions for 2026 FIVB Volleyball Women's Nations League in Nanjing: Poland beats Czech Republic 3-0
Deterministic
Policy
Gradient
for
Learning
Equilibrium in Time-Inconsistent Control Problems
聽
馃搱
Optimization
聽
Content type:
Academic
arxiv.org
路
20h
20 hours ago
Actions for Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems
Microsoft just shared the frontier data engineering secrets
聽
馃
Data science
mail.bycloud.ai
路
2d
2 days ago
Actions for Microsoft just shared the frontier data engineering secrets
A Human-Augmenting
Agentic
Workflow for Causal Inference
聽
馃
DuckDB
聽
Content type:
Blog
netflixtechblog.medium.com
路
3d
3 days ago
Actions for A Human-Augmenting Agentic Workflow for Causal Inference
How to Train Your Goblin
聽
馃幆
Fine-tuning
goblins.mchen.workers.dev
路
4d
4 days ago
路
Hacker News
,
Hacker News
Actions for How to Train Your Goblin
Edge AI enabled MIMO MC-CDMA for 6G optimizing spectrum and energy efficiency with SIC and deep
reinforcement
learning
聽
馃
Machine Learning
聽
Content type:
Academic
nature.com
路
2d
2 days ago
Actions for Edge AI enabled MIMO MC-CDMA for 6G optimizing spectrum and energy efficiency with SIC and deep reinforcement learning
Researchers trained an open source AI search
agent
, Harness-1, that outperforms GPT-5.4 on recalling relevant information
聽
馃幆
Fine-tuning
venturebeat.com
路
3d
3 days ago
路
Hacker News
Actions for Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information
Phi-Actor-Critic
: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria
聽
馃搱
Optimization
聽
Content type:
Academic
arxiv.org
路
20h
20 hours ago
Actions for Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria
馃Top AI Papers of the Week
聽
馃敩
Deep Learning
聽
Content type:
News
nlp.elvissaravia.com
路
4d
4 days ago
Actions for 馃Top AI Papers of the Week
How to Stop Shipping Low-Quality
RL
Environments (with Examples)
聽
馃幆
Fine-tuning
聽
Content type:
News
latent.space
路
6d
6 days ago
路
Hacker News
Actions for How to Stop Shipping Low-Quality RL Environments (with Examples)
Protest against ballot paper shortages enters 2nd day, demanding new election
聽
馃摗
RSS
聽
Content type:
News
koreatimes.co.kr
路
5d
5 days ago
路
r/news
Actions for Protest against ballot paper shortages enters 2nd day, demanding new election
Improving Generalization and Data Efficiency with Diffusion in Offline
Multi-agent
RL
聽
馃敩
Deep Learning
聽
Content type:
Academic
arxiv.org
路
20h
20 hours ago
Actions for Improving Generalization and Data Efficiency with Diffusion in Offline Multi-agent RL
The
Exploit
Always Wins
聽
馃敩
Deep Learning
聽
Content type:
Blog
abhishek-shankar.com
路
6d
6 days ago
Actions for The Exploit Always Wins
Bridging Multi-Vector and
Learned-Sparse
Retrieval, A Diagnostic Framework for Robust Semantic IDs, and More!
聽
馃獰
Context Windows
聽
Content type:
News
聽
Content type:
Blog
recsys.substack.com
路
6d
6 days ago
路
Substack
Actions for Bridging Multi-Vector and Learned-Sparse Retrieval, A Diagnostic Framework for Robust Semantic IDs, and More!
« Page 1
路
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help