Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
馃幆 Reinforcement Learning
Q-learning, Policy Gradient, Reward Functions, TD Learning
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
99
posts in
10.5
ms
660 AI Agents Ran 27,000 Experiments. Their Biggest Discovery Was a 2015 Textbook Result.
聽
馃
Neuromorphic Hardware
towardsai.net
路
2d
TabQL: In-Context
Q-Learning
with Tabular Foundation
Models
聽
馃攧
Meta-Learning
arxiv.org
路
1d
Wikipedia
聽
馃敩
Science
en.wikipedia.org
路
6d
Weekly Research Recap
聽
馃
Machine Learning
quantseeker.com
路
1d
Randomized Advantage Transformation (RAT): Computing Natural
Policy
Gradients
via Direct Backpropagation
聽
馃攧
Meta-Learning
arxiv.org
路
2d
A Chinese study monitoring
low-frequency
time-code signals during the November 2025 geomagnetic storm found that signal strength dropped by over 2.3 dB渭V/m and ...
聽
馃摗
Signal Processing
frontiersin.org
路
2d
路
r/space
Policy
Optimization
in Hybrid Discrete-Continuous Action Spaces via Mixed
Gradients
聽
馃幆
Predictive Coding
arxiv.org
路
6d
Learning
to Hand Off: Provably Convergent Workflow
Learning
under Interface Constraints
聽
馃
Neuromorphic Hardware
arxiv.org
路
1d
An Encoded Corrective Double
Deep
Q-Networks
for
Multi-Agent
Control Systems
聽
馃
Neuromorphic Hardware
arxiv.org
路
6d
Progressive Generalization Augmentation with
Deeply
Coupled
RND-PPO
and Domain-Prioritized Noise Injection for Robust Crop Management
Reinforcement
Learning
聽
馃攧
Meta-Learning
arxiv.org
路
2d
blevesearch/vellum: A Go library implementing a FST (finite state transducer)
聽
馃
Neuromorphic Hardware
github.com
路
2d
DiffusionOPD: A Unified Perspective of
On-Policy
Distillation in Diffusion
Models
聽
馃攧
Meta-Learning
arxiv.org
路
6d
GAE Falls Short in Imperfect-Information Self-Play
Reinforcement
Learning
聽
馃攧
Meta-Learning
arxiv.org
路
1d
Addressing Terminal Constraints in Data-Driven Demand Response Scheduling
聽
馃
Neuromorphic Hardware
arxiv.org
路
6d
When Actions Disappear: Adversarial Action Removal in Self-Play
Reinforcement
Learning
聽
馃攧
Meta-Learning
arxiv.org
路
2d
$f$-Trajectory Balance: A Loss Family for Tuning GFlowNets, Generative
Models
, and LLMs with Off- and
On-Policy
Data
聽
馃攧
Meta-Learning
arxiv.org
路
3d
DEVIS-GRPO: Unleashing GRPO on Dynamic Extreme View Synthesis
聽
馃幆
Predictive Coding
arxiv.org
路
2d
Offline Contextual
Bandits
in the Presence of New Actions
聽
馃攧
Meta-Learning
arxiv.org
路
2d
ROAD: Adaptive Data Mixing for Offline-to-Online
Reinforcement
Learning
via Bi-Level
Optimization
聽
馃攧
Meta-Learning
arxiv.org
路
6d
Decoupling KL and Trajectories: A Unified Perspective for SFT, DAgger, Offline RL, and OPD in LLM Distillation
聽
馃攧
Meta-Learning
arxiv.org
路
2d
« Page 1
路
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help