Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🎮 Reinforcement Learning
Q-Learning, Policy Gradient, Reward Systems, Game AI, Robotics
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
5541
posts in
20.7
ms
Temper
and
Tilt
Lead to SLOP: Reward Hacking Mitigation with Inference-Time Alignment
⚙
Context engineering
arxiv.org
·
2d
Reinforcement Learning for Scalable and
Trustworthy
Intelligent
Systems
⚙
Context engineering
arxiv.org
·
4d
D-VLA
: A High-Concurrency Distributed
Asynchronous
Reinforcement Learning Framework for Vision-Language-Action Models
⚙
Context engineering
arxiv.org
·
2d
Adaptive Smooth
Tchebycheff
Attention for
Multi-Objective
Policy Optimization
🔍
AI Interpretability
arxiv.org
·
2d
Trust the
Batch
, On- or Off-Policy: Adaptive Policy Optimization for
RL
Post-Training
⚙
Context engineering
arxiv.org
·
3d
Teacher-Guided
Policy Optimization for LLM
Distillation
⚙
Context engineering
arxiv.org
·
2d
AHD
Agent: Agentic Reinforcement Learning for Automatic
Heuristic
Design
⚙
Context engineering
arxiv.org
·
4d
Diagnosing
Training Inference
Mismatch
in LLM Reinforcement Learning
⚙
Context engineering
arxiv.org
·
1d
Generative Floor Plan Design with LLMs via Reinforcement Learning with
Verifiable
Rewards
⚙
Context engineering
arxiv.org
·
1d
Learning
Agentic
Policy from Action
Guidance
⚙
Context engineering
arxiv.org
·
3d
HLS-Seek
:
QoR-Aware
Code Generation for High-Level Synthesis via Proxy Comparative Reward Reinforcement Learning
🤝
Multi-Agent Systems
arxiv.org
·
2d
Discrete
Flow
Matching
for Offline-to-Online Reinforcement Learning
⚙
Context engineering
arxiv.org
·
3d
When Does
Hierarchy
Help? Benchmarking Agent
Coordination
in Event-Driven Industrial Scheduling
🤝
Multi-Agent Systems
arxiv.org
·
2d
FutureSim
:
Replaying
World Events to Evaluate Adaptive Agents
⚙
Context engineering
arxiv.org
·
1d
RankQ
: Offline-to-Online Reinforcement Learning via
Self-Supervised
Action Ranking
🎯
Reranking
arxiv.org
·
3d
Bridging Domain
Gaps
with
Target-Aligned
Generation for Offline Reinforcement Learning
⚙
Context engineering
arxiv.org
·
2d
Continual
Harness
: Online Adaptation for Self-Improving Foundation Agents
⚙
Context engineering
arxiv.org
·
4d
·
Hacker News
Language-Based Agent Control
🤖
agents
arxiv.org
·
2d
How to
Interpret
Agent
Behavior
🤖
agents
arxiv.org
·
2d
Model-Driven Policy Optimization in
Differentiable
Simulators
via Stochastic Exploration
⚙
Context engineering
arxiv.org
·
5d
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help