Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🎯 Reinforcement Learning
RL, RLHF, reward models, policy optimization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
151281
posts in
22.7
ms
Behavior-Constrained Reinforcement Learning with
Receding-Horizon
Credit
Assignment
for High-Performance Control
💾
Agent Memory
arxiv.org
·
4d
Markov
Decision
Processes
: The Language of Reinforcement Learning
💾
Agent Memory
medium.com
·
5d
Reinforcement
Learning From Human Feedback (
RLHF
) in Large Language Models(LLMs)
🧠
LLMs
pub.towardsai.net
·
6d
Continual
learning for AI agents
💾
Agent Memory
blog.langchain.com
·
4d
·
Hacker News
Hierarchical
Reinforcement Learning with Augmented Step-Level
Transitions
for LLM Agents
🌐
World Models
arxiv.org
·
2d
Reinforcement
Learning for LLM Post-Training: A
Survey
🧠
LLMs
arxiv.org
·
1d
QaRL
: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference
Mismatch
⚡
Inference
arxiv.org
·
12h
Reinforcement Learning with LLM-Guided Action
Spaces
for
Synthesizable
Lead Optimization
🧠
LLMs
arxiv.org
·
12h
Predictive
Representations
for Skill Transfer in Reinforcement Learning
💾
Agent Memory
arxiv.org
·
1d
Large Language Model Post-Training: A
Unified
View of Off-Policy and On-Policy Learning
🧠
LLMs
arxiv.org
·
12h
Reinforcement Learning with
Reward
Machines
for Sleep Control in Mobile Networks
💾
Agent Memory
arxiv.org
·
12h
Offline
RL
for Adaptive Policy Retrieval in Prior
Authorization
💾
Agent Memory
arxiv.org
·
2d
Aligning
Agents via Planning: A Benchmark for
Trajectory-Level
Reward Modeling
💾
Agent Memory
arxiv.org
·
12h
Target Policy Optimization
🌐
World Models
arxiv.org
·
2d
PriPG-RL
: Privileged Planner-Guided Reinforcement Learning for Partially
Observable
Systems with Anytime-Feasible MPC
💾
Agent Memory
arxiv.org
·
12h
Thompson Sampling for Infinite-Horizon
Discounted
Decision
Processes
🌐
World Models
arxiv.org
·
1d
Value
Mirror
Descent
for Reinforcement Learning
💾
Agent Memory
arxiv.org
·
2d
Provable
Multi-Task Reinforcement Learning: A Representation Learning Framework with Low
Rank
Rewards
🌐
World Models
arxiv.org
·
3d
DROP:
Distributional
and Regular Optimism and
Pessimism
for Reinforcement Learning
🌐
World Models
arxiv.org
·
1d
Vintix
II: Decision Pre-Trained Transformer is a Scalable In-Context Reinforcement
Learner
💾
Agent Memory
arxiv.org
·
2d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help