Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
You're currently offline. Some features may not work.
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
馃幃 Reinforcement Learning
Q-Learning, Policy Gradients, Environments, Rewards
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
112864
posts in
673.2
ms
Temperature
as a Meta-Policy: Adaptive
Temperature
in LLM
Reinforcement
Learning
arxiv.org
路
1d
馃
AI
Explainable
Causal Reinforcement Learning for heritage language
revitalization
programs with inverse simulation verification
dev.to
路
7h
路
Discuss:
DEV
馃
AI
CM2
: Reinforcement Learning with
Checklist
Rewards for Multi-Turn and Multi-Step Agentic Tool Use
arxiv.org
路
1d
馃寪
Distributed Systems
Multi-armed
bandit
en.wikipedia.org
路
1d
馃
AI
Optimizing post-disaster road
restoration
with reinforcement learning: A
traveler-behavior-aware
approach
sciencedirect.com
路
2d
馃寪
Distributed Systems
check out this
article
on Reinforcement Learning with R:
Origins
, Real-Life Applications, and Practical Implementation
dev.to
路
4d
路
Discuss:
DEV
馃
AI
Power of Agent
assisted
coding and learning to
achieve
goals faster and cheaper
osm2pgsql.org
路
2h
路
Discuss:
DEV
馃
AI
#0186: What We Let
Machines
Do
matthewsinclair.medium.com
路
3h
馃
AI
Forge
: Scalable Agent
RL
Framework and Algorithm
minimax.io
路
1d
路
Discuss:
Hacker News
馃寪
Distributed Systems
The
implementation
for the
drifting
model
breno.bearblog.dev
路
1d
馃
AI
Painless
Activation
Steering
(PAS): Automated, Lightweight Post鈥慣raining for LLM Behavior
sashacui.substack.com
路
10h
路
Discuss:
Substack
馃攢
Transformers
Functional distinctions between
orbitofrontal
cortex and anterior cingulate cortex
subregions
in decision-making and autonomic regulation
nature.com
路
4h
馃攢
Transformers
Show HN:
Fighting
the War Against
Expensive
Reinforcement Learning
cadenza-landing-qtu7gbjwb-akshparekh123-3457s-projects.vercel.app
路
2d
路
Discuss:
Hacker News
馃
AI
Optimal
timing
for
superintelligence
marginalrevolution.com
路
1d
馃寪
Distributed Systems
A
Conceptual
Framework for Exploration
Hacking
lesswrong.com
路
2d
馃敡
Feature Engineering
MiniMax-AI/MiniMax-M2.5
github.com
路
4h
馃
AI
I Built a Smart Movie
Recommender
with Collaborative
Filtering
analyticsvidhya.com
路
3h
馃Л
Vector Databases
Decoding urban
soundscapes
: spatial prediction and influence mechanism analysis with
interpretable
semi-supervised learning
sciencedirect.com
路
1h
馃敡
Feature Engineering
Completed
Hyperparameter
Transfer across Modules, Width, Depth, Batch and
Duration
machinelearning.apple.com
路
1d
馃敡
Feature Engineering
Swift to
Harbour
, Slow to
Berth
joehalliwell.com
路
2h
馃寪
Distributed Systems
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help