Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🎮 Reinforcement Learning
RL, reward function, policy, agents, Q-learning
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
133
posts in
12.6
ms
Resolving
Action
Bottleneck:
Agentic
Reinforcement
Learning Informed by Token-Level Energy
🧠
Neuromorphic Computing
arxiv.org
·
6d
A Red Teaming Framework for Evaluating Robustness of AI-enabled Security Orchestration, Automation, and Response Systems
🛡️
AI Safety
arxiv.org
·
2d
Prompt
Optimization
for LLM Code Generation via
Reinforcement
Learning
📝
LLMs
arxiv.org
·
1d
Policy
Optimization
in Hybrid Discrete-Continuous
Action
Spaces via Mixed Gradients
✨
Generative AI
arxiv.org
·
6d
The Dynamics of
Policy
Gradient
in Social Dilemmas with Partner Selection
⚖️
AI Ethics
arxiv.org
·
2d
Learning-Zone
Energy: Online Data Selection for Efficient
RL
Post-Training
🧠
Machine Learning
arxiv.org
·
2d
Critic-Driven
Voronoi-Quantization for Distilling Deep
RL
Policies
to Explainable Models
🔥
PyTorch
arxiv.org
·
6d
Weak-to-Strong Elicitation via Mismatched Wrong Drafts
🔍
RAG
arxiv.org
·
2d
Clock-state
olfactory search in turbulent flows using
Q-learning
: The geometry of plume recovery
🧠
Neuromorphic Computing
arxiv.org
·
3d
When
Critics
Disagree: Adaptive
Reward
Poisoning Attacks in RIS-Aided Wireless Control System
🧠
Neuromorphic Computing
arxiv.org
·
1d
Emergence of a Flow-Assisted Casting Strategy for Olfactory Navigation via Memory-Augmented
Reinforcement
Learning
🧠
Neuromorphic Computing
arxiv.org
·
1d
Temporal
Fair Division in
Multi-Agent
Systems: From Precise Alternation Metrics to Scalable Coordination Proxies
⚖️
AI Ethics
arxiv.org
·
6d
A Machine with Short-Term, Episodic, and Semantic Memory Systems
🧠
Neuromorphic Computing
arxiv.org
·
2d
Convergence of Stochastic First-Order Algorithms in Bertrand Competition Under Incomplete Information
🧠
Machine Learning
arxiv.org
·
2d
Chrono-Gymnasium: An Open-Source, Gymnasium-Compatible Distributed Simulation Framework
🔥
PyTorch
arxiv.org
·
6d
Shared Backbone
PPO
for
Multi-UAV
Communication Coverage with Connection Preservation
🔥
PyTorch
arxiv.org
·
2d
Curriculum-Guided Heterogeneous
Multi-Agent
Intelligence for
Multi-UAV
Cooperative ISAC
🔐
Cybersecurity
arxiv.org
·
2d
Dual Hierarchical Dialogue
Policy
Learning
for Legal Inquisitive Conversational
Agents
📝
LLMs
arxiv.org
·
6d
Addressing Terminal Constraints in Data-Driven Demand Response Scheduling
🧠
Machine Learning
arxiv.org
·
6d
Randomized Advantage Transformation (RAT): Computing Natural
Policy
Gradients
via Direct Backpropagation
🔥
PyTorch
arxiv.org
·
2d
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help