Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
馃幃 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
187334
posts in
29.1
ms
FutureWorld
: A Live Environment for Training Predictive Agents with Real-World
Outcome
Rewards
聽
馃
AI Agents
arxiv.org
路
23h
Sample-efficient
Neuro-symbolic
Proximal
Policy Optimization
聽
馃
Neural Networks
arxiv.org
路
1d
Digital Twin-assisted belief-state reinforcement learning for latency-robust
ISAC
in
6G
networks
聽
馃搻
ML Theory
arxiv.org
路
23h
SpecRLBench
: A Benchmark for Generalization in
Specification-Guided
Reinforcement Learning
聽
馃搻
ML Theory
arxiv.org
路
2d
Robust
Representation
Learning through
Explicit
Environment Modeling
聽
馃
Machine Learning
arxiv.org
路
23h
From
Coarse
to Fine: Self-Adaptive
Hierarchical
Planning for LLM Agents
聽
馃
AI Agents
arxiv.org
路
2d
A Survey of Multi-Agent Deep
Reinforcement
Learning with Graph Neural Network-Based
Communication
聽
馃
AI Agents
arxiv.org
路
23h
Reward Models Are Secretly Value Functions:
Temporally
Coherent
Reward Modeling
聽
鈾燂笍
Game Theory
arxiv.org
路
2d
Safe Navigation using Neural
Radiance
Fields via
Reachable
Sets
聽
馃搻
ML Theory
arxiv.org
路
23h
TSN-Affinity
: Similarity-Driven Parameter Reuse for Continual Offline Reinforcement Learning
聽
馃
Machine Learning
arxiv.org
路
1d
NeuroPlastic
: A Plasticity-Modulated Optimizer for
Biologically
Inspired Learning Dynamics
聽
馃
Neural Networks
arxiv.org
路
23h
Frictive
Policy Optimization for LLMs:
Epistemic
Intervention, Risk-Sensitive Control, and Reflective Alignment
聽
馃搻
ML Theory
arxiv.org
路
1d
Dynamical
Priors
as a Training Objective in Reinforcement Learning
聽
馃搻
ML Theory
arxiv.org
路
6d
Quantum
Grover
Adaptive Search for
Discrete
Simulation Optimization
聽
馃搻
ML Theory
arxiv.org
路
23h
How Fast Should a Model Commit to Supervision? Training Reasoning Models on the
Tsallis
Loss
Continuum
聽
馃搻
ML Theory
arxiv.org
路
1d
Split over $n$ resource sharing problem: Are fewer
capable
agents better than many
simpler
ones?
聽
馃
AI Agents
arxiv.org
路
23h
Safe-Support Q-Learning: Learning without
Unsafe
Exploration
聽
馃
Machine Learning
arxiv.org
路
1d
SOLAR-RL
: Semi-Online Long-horizon
Assignment
Reinforcement Learning
聽
馃
AI Agents
arxiv.org
路
3d
Bian
Que: An Agentic Framework with Flexible Skill
Arrangement
for Online System Operations
聽
馃
AI Agents
arxiv.org
路
23h
Dyna-Style
Safety Augmented Reinforcement Learning:
Staying
Safe in the Face of Uncertainty
聽
馃
AI Agents
arxiv.org
路
1d
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help