Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
388
posts in
7.8
ms
Experts weigh in on Anthropic’s Fable 5, Mythos 5 releases
📐
Formal Methods
sdtimes.com
·
1d
1 day ago
Actions for Experts weigh in on Anthropic’s Fable 5, Mythos 5 releases
I got so mad at poke(rogue)like that I trained a
RL
agent
to beat it for me
🤖
Machine Learning
thiagolira.blot.im
·
3d
3 days ago
·
Hacker News
Actions for I got so mad at poke(rogue)like that I trained a RL agent to beat it for me
Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep
Reinforcement
Learning
💬
LLMs
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Space-sampled Value Decay: Forgetting Mechanisms for Non-stationary Deep Reinforcement Learning
How AI chatbots become better
learning
coaches
💬
LLMs
techxplore.com
·
14h
14 hours ago
Actions for How AI chatbots become better learning coaches
🥇Top AI Papers of the Week
🤖
AI
Content type:
News
nlp.elvissaravia.com
·
4d
4 days ago
Actions for 🥇Top AI Papers of the Week
Mbodi AI (YC P25) Is Hiring Founding
Machine
Learning
Engineer (Robotics)
🤖
AI
ycombinator.com
·
5d
5 days ago
·
Hacker News
Actions for Mbodi AI (YC P25) Is Hiring Founding Machine Learning Engineer (Robotics)
San Francisco Construction Security Company: Complete Guide to Protecting Your Job Site in 2026
💻
Tech Industry
Content type:
Blog
medium.com
·
2d
2 days ago
Actions for San Francisco Construction Security Company: Complete Guide to Protecting Your Job Site in 2026
CCKS: Consensus-based Communication and Knowledge Sharing
🖧
Distributed Systems
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for CCKS: Consensus-based Communication and Knowledge Sharing
Edge AI enabled MIMO MC-CDMA for 6G
optimizing
spectrum and energy efficiency with SIC and deep
reinforcement
learning
🤖
Machine Learning
Content type:
Academic
nature.com
·
1d
1 day ago
Actions for Edge AI enabled MIMO MC-CDMA for 6G optimizing spectrum and energy efficiency with SIC and deep reinforcement learning
The Exploit Always Wins
✍️
Prompt Engineering
Content type:
Blog
abhishek-shankar.com
·
6d
6 days ago
Actions for The Exploit Always Wins
Comp.compilers: Paper: MileStone: A Multi-Objective Compiler Phase Ordering Framework for Graph-based IR-Level
Optimization
⚙️
Compilers
compilers.iecc.com
·
5d
5 days ago
Actions for Comp.compilers: Paper: MileStone: A Multi-Objective Compiler Phase Ordering Framework for Graph-based IR-Level Optimization
You'
re
doing it wrong
🍳
Cooking
Content type:
News
understandably.com
·
2d
2 days ago
Actions for You're doing it wrong
Variational
Proximal
Policy
Optimization
🤖
Machine Learning
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Variational Proximal Policy Optimization
Bridging Multi-Vector and
Learned-Sparse
Retrieval, A Diagnostic Framework for Robust Semantic IDs, and More!
💬
LLMs
Content type:
News
Content type:
Blog
recsys.substack.com
·
5d
5 days ago
·
Substack
Actions for Bridging Multi-Vector and Learned-Sparse Retrieval, A Diagnostic Framework for Robust Semantic IDs, and More!
SLUUG Talk: Demystifying Large Language Models on Linux
🤖
AI
Content type:
Code
github.com
·
4d
4 days ago
·
DEV
Actions for SLUUG Talk: Demystifying Large Language Models on Linux
Geometrically Averaged Hard Target Updates for Linear
Q-Learning
🤖
Machine Learning
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Geometrically Averaged Hard Target Updates for Linear Q-Learning
Sequent: scale and automation for higher confidence in alignment
🤖
AI
lesswrong.com
·
1d
1 day ago
Actions for Sequent: scale and automation for higher confidence in alignment
HERO: Hindsight-Enhanced Reflection from
Environment
Observations for
Agentic
Self-Distillation
🏗️
AI Infrastructure
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation
BeatpulseLabs raises $1.8M pre-seed to scale AI training data
🤖
Machine Learning
Content type:
News
tech.eu
·
3d
3 days ago
Actions for BeatpulseLabs raises $1.8M pre-seed to scale AI training data
Fast and Highly Expressive
Policy
Learning
for Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
🏗️
AI Infrastructure
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help