Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
馃幆 Reinforcement Learning
Q-learning, Policy Gradient, Reward Functions, TD Learning
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
99
posts in
11.3
ms
Weak-to-Strong Elicitation via Mismatched Wrong Drafts
聽
馃攧
Meta-Learning
arxiv.org
路
2d
Response-Conditioned Parallel-to-Sequential Orchestration for
Multi-Agent
Systems
聽
馃幆
Predictive Coding
arxiv.org
路
3d
QuantFPFlow: Quantum Amplitude Estimation for Fokker--Planck
Policy
Optimisation in Continuous
Reinforcement
Learning
聽
馃
Neuromorphic Hardware
arxiv.org
路
2d
On Gaussian approximation for entropy-regularized
Q-learning
with
function
approximation
聽
馃尦
recursive neural networks
arxiv.org
路
2d
Resolving Action Bottleneck: Agentic
Reinforcement
Learning
Informed by Token-Level Energy
聽
馃攧
Meta-Learning
arxiv.org
路
6d
Equilibrium Selection in
Multi-Agent
Policy
Gradients
via Opponent-Aware Basin Entry
聽
馃尦
recursive neural networks
arxiv.org
路
2d
When
Critics
Disagree: Adaptive
Reward
Poisoning Attacks in RIS-Aided Wireless Control System
聽
馃
Neuromorphic Hardware
arxiv.org
路
1d
Identifying Culprits Through
Deep
Deterministic
Policy
Gradient
Deep
Learning Investigation
聽
馃
Machine Learning
arxiv.org
路
6d
Prompt
Optimization
for LLM Code Generation via
Reinforcement
Learning
聽
馃尦
recursive neural networks
arxiv.org
路
1d
Distributed Zeroth-Order
Policy
Gradient
for
Networked
Multi-agent Reinforcement Learning from Human Feedback
聽
馃
Neuromorphic Hardware
arxiv.org
路
3d
The Dynamics of
Policy
Gradient
in Social Dilemmas with Partner Selection
聽
馃
Neuromorphic Hardware
arxiv.org
路
2d
Scalable Bi-causal
Optimal
Transport via KL Relaxation and
Policy
Gradients
聽
馃攧
Meta-Learning
arxiv.org
路
2d
Clock-state olfactory search in turbulent flows using
Q-learning
: The geometry of plume recovery
聽
馃
Neuromorphic Hardware
arxiv.org
路
3d
Shared Backbone
PPO
for
Multi-UAV
Communication Coverage with Connection Preservation
聽
馃暩
Mesh Networking
arxiv.org
路
2d
EUPHORIA: Efficient Universal Planning via Hybrid
Optimization
for Robust Industrial Robotic Assembly
聽
馃
Soft Robotics
arxiv.org
路
1d
Flow Field Reconstruction with Sensor Placement
Policy
Learning
聽
馃
Neuromorphic Computing
arxiv.org
路
6d
Leveraging
Deep
Reinforcement
Learning
for Clustered Cell-Free Networking Over User Mobility
聽
馃暩
Mesh Networking
arxiv.org
路
2d
AIS: Adaptive Importance Sampling for Quantized RL
聽
馃攧
Meta-Learning
arxiv.org
路
6d
A Heuristic Approach for Performance Tuning in
RL-based
Quadrotor Control via
Reward
Design and Termination Conditions
聽
馃
Robotics
arxiv.org
路
1d
A
Multi-Layer
Cloud-IDS Pipeline with LLM and Adaptive
Q-Learning
Calibration
聽
馃攧
Meta-Learning
arxiv.org
路
3d
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help