Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
馃幃 Reinforcement Learning
Specific
RL, reward functions, policy gradient, RLHF
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
149487
posts in
11.3
ms
Target Policy Optimization
聽
馃搻
ML Theory
arxiv.org
路
2d
Markov
Decision
Processes
: The Language of Reinforcement Learning
聽
鈾燂笍
Game Theory
medium.com
路
4d
Rethinking
Robotics Reinforcement Learning: A Practical
Humanoid
Training Workflow
聽
馃
AI Agents
semiengineering.com
路
1d
Reinforcement
Learning From Human Feedback (
RLHF
) in Large Language Models(LLMs)
聽
馃挰
LLMs
pub.towardsai.net
路
6d
Formalizing
the "generative crash" via
inverse
reinforcement learning
聽
馃
AI Agents
news.ycombinator.com
路
2d
路
Hacker News
Three Ways
Machines
Learn
聽
馃
Machine Learning
medium.com
路
3d
Continual
learning for AI agents
聽
馃
AI Agents
bestblogs.dev
路
4d
QaRL
: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference
Mismatch
聽
馃搻
ML Theory
arxiv.org
路
5h
Continual
learning for AI agents
聽
馃
AI Agents
blog.langchain.com
路
4d
路
Hacker News
Learning over
Forward-Invariant
Policy
Classes
: Reinforcement Learning without Safety Concerns
聽
馃
AI Agents
arxiv.org
路
5h
Aligning
Agents via Planning: A Benchmark for
Trajectory-Level
Reward Modeling
聽
馃
AI Agents
arxiv.org
路
5h
PriPG-RL
: Privileged Planner-Guided Reinforcement Learning for Partially
Observable
Systems with Anytime-Feasible MPC
聽
馃
AI Agents
arxiv.org
路
5h
Value
Mirror
Descent
for Reinforcement Learning
聽
馃搻
ML Theory
arxiv.org
路
2d
Value-Guidance
MeanFlow
for
Offline
Multi-Agent Reinforcement Learning
聽
馃
AI Agents
arxiv.org
路
5h
Hierarchical
Reinforcement Learning with Augmented Step-Level
Transitions
for LLM Agents
聽
馃
AI Agents
arxiv.org
路
2d
Multi-agent Reach-avoid
MDP
via Potential Games and
Low-rank
Policy Structure
聽
鈾燂笍
Game Theory
arxiv.org
路
5h
Enhancing
sample
efficiency in reinforcement-learning-based flow control: replacing the
critic
with an adaptive reduced-order model
聽
馃
AI Agents
arxiv.org
路
2d
Reinforcement Learning with LLM-Guided Action
Spaces
for
Synthesizable
Lead Optimization
聽
馃挰
LLMs
arxiv.org
路
5h
Predictive
Representations
for Skill Transfer in Reinforcement Learning
聽
馃
Machine Learning
arxiv.org
路
1d
Active Reward Machine Inference From
Raw
State
Trajectories
聽
馃搻
ML Theory
arxiv.org
路
5h
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help