Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
365
posts in
11.3
ms
🎯
RLHF
arXiv
·
1d
1 day ago
Bias-Controlled Primal-Dual Natural
Actor-Critic
: Optimal Rates for Constrained
Multi-Objective
Average-Reward RL
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Bias-Controlled Primal-Dual Natural Actor-Critic: Optimal Rates for Constrained Multi-Objective Average-Reward RL
🤖
LLM, Agent
The Batch
·
6d
6 days ago
Jun 19, 2026
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Jun 19, 2026
🏗️
AI Infrastructure
IT之家
·
3d
3 days ago
郭明錤:谷歌开发 TPU v9 芯片推理优化升级款,联发科接单
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 郭明錤:谷歌开发 TPU v9 芯片推理优化升级款,联发科接单
🧠
Context Engineering
arXiv
·
1d
1 day ago
Compositional Behavioral Semantics for State Abstraction in
Reinforcement
Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Compositional Behavioral Semantics for State Abstraction in Reinforcement Learning
🤖
AI agent development
ujangriswanto08.medium.com
·
6d
6 days ago
How SARSA Trains Smarter Agents Through
On-Policy
Updates
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for How SARSA Trains Smarter Agents Through On-Policy Updates
🔬
AI Research
arXiv
·
2d
2 days ago
EMAgnet: Parameter-Space EMA Regularization for
Policy
Gradient
Self-Play in Large Games
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for EMAgnet: Parameter-Space EMA Regularization for Policy Gradient Self-Play in Large Games
🧠
LLM Training
arXiv
·
2d
2 days ago
Weight-Space Geometry of Offline Reasoning Training
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Weight-Space Geometry of Offline Reasoning Training
🎯
RLHF
arXiv
·
2d
2 days ago
An Introduction to Causal
Reinforcement
Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for An Introduction to Causal Reinforcement Learning
🔬
AI Research
arXiv
·
34m
34 minutes ago
GEOALIGN: Geometric Rollout Curation for Robust LLM
Reinforcement
Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for GEOALIGN: Geometric Rollout Curation for Robust LLM Reinforcement Learning
🤖
AI agent development
arXiv
·
2d
2 days ago
Reinforcement
Learning
for Computer-Use Agents with Autonomous Evaluation
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Reinforcement Learning for Computer-Use Agents with Autonomous Evaluation
⚡
LLM Optimization
arXiv
·
34m
34 minutes ago
State Representation Matters in Deep
Reinforcement
Learning
: Application to Energy Trading
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for State Representation Matters in Deep Reinforcement Learning: Application to Energy Trading
🎯
RLHF
arXiv
·
1d
1 day ago
MAPL:
Multi-Objective
Preference
Learning
for Robot Locomotion
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for MAPL: Multi-Objective Preference Learning for Robot Locomotion
🎯
RLHF
arXiv
·
34m
34 minutes ago
Deterministic Pareto-Optimal
Policy
Synthesis for
Multi-Objective
Reinforcement
Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Deterministic Pareto-Optimal Policy Synthesis for Multi-Objective Reinforcement Learning
🔬
AI Research
arXiv
·
2d
2 days ago
LaGO: Latent Action Guidance for Online
Reinforcement
Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for LaGO: Latent Action Guidance for Online Reinforcement Learning
🔬
AI Research
arXiv
·
1d
1 day ago
FactorLibrary: From Polynomials to Circuits via Recursive Subgoals
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for FactorLibrary: From Polynomials to Circuits via Recursive Subgoals
🛡️
AI Safety
arXiv
·
2d
2 days ago
Reinforcement
Learning
Towards Broadly and Persistently Beneficial Models
Covers
Reinforcement learning towards broadly and persistently beneficial models
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Reinforcement Learning Towards Broadly and Persistently Beneficial Models
🔄
Meta-Learning
arXiv
·
34m
34 minutes ago
VoiceTTA: Enhancing Zero-Shot Text-to-Speech via
Reinforcement
Learning-Based
Test-Time Adaptation
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for VoiceTTA: Enhancing Zero-Shot Text-to-Speech via Reinforcement Learning-Based Test-Time Adaptation
🕸️
Multi-Agent Systems
arXiv
·
1d
1 day ago
Low Variance Trust Region Optimization with Independent
Actors
and Sequential Updates in Cooperative
Multi-agent
Reinforcement
Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Low Variance Trust Region Optimization with Independent Actors and Sequential Updates in Cooperative Multi-agent Reinforcement Learning
🔀
LoRA
arXiv
·
1d
1 day ago
Memory-Efficient
Policy
Libraries with Low-Rank Adaptation in
Reinforcement
Learning
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Memory-Efficient Policy Libraries with Low-Rank Adaptation in Reinforcement Learning
🎯
RLHF
arXiv
·
1d
1 day ago
Supervised
Reinforcement
Learning
for the Coordination of Distributed Energy Resources
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Supervised Reinforcement Learning for the Coordination of Distributed Energy Resources
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report