Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
RL
🎮 RL
reinforcement learning, reward, policy gradient, Q-learning
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
446
posts in
9.7
ms
Protest against ballot paper shortages enters 2nd day, demanding new election
🤝
Consensus Algorithms
Content type:
News
koreatimes.co.kr
·
4d
4 days ago
·
r/news
Actions for Protest against ballot paper shortages enters 2nd day, demanding new election
Discovering Interpretable
Multi-Parameter
Control
Policies
for Evolutionary Algorithms Using
Deep
Reinforcement Learning
🧠
Deep Learning
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Discovering Interpretable Multi-Parameter Control Policies for Evolutionary Algorithms Using Deep Reinforcement Learning
Training Deliberative Monitors for Black-Box Scheming Detection
🎯
Reinforcement Learning from Human Feedback
lesswrong.com
·
6d
6 days ago
Actions for Training Deliberative Monitors for Black-Box Scheming Detection
[AINews] Reve 2 and Ideogram 4: Layouts in Imagegen
🤖
ML
latent.space
·
6d
6 days ago
Actions for [AINews] Reve 2 and Ideogram 4: Layouts in Imagegen
Comp.compilers: Paper: MileStone: A
Multi-Objective
Compiler Phase Ordering Framework for
Graph-based
IR-Level Optimization
🧠
Deep Learning
compilers.iecc.com
·
5d
5 days ago
Actions for Comp.compilers: Paper: MileStone: A Multi-Objective Compiler Phase Ordering Framework for Graph-based IR-Level Optimization
Beyond Uniform Token-Level Trust Region in LLM
Reinforcement
Learning
🎯
Reinforcement Learning from Human Feedback
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning
Development of COVID-19 Booster Vaccine
Policy
by Microsimulation and
Q-learning
🧠
Deep Learning
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Development of COVID-19 Booster Vaccine Policy by Microsimulation and Q-learning
Deep
reinforcement
learning
for process design: Review and perspective
🎯
Reinforcement Learning from Human Feedback
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Deep reinforcement learning for process design: Review and perspective
3SPO: State-Score-Supervised
Policy
Optimization for LLM
Agents
🎯
Reinforcement Learning from Human Feedback
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for 3SPO: State-Score-Supervised Policy Optimization for LLM Agents
Reinforcement
learning
in linear embedding space unlocks generalizable control across soft robot configurations
🎯
Reinforcement Learning from Human Feedback
Content type:
Academic
nature.com
·
3d
3 days ago
Actions for Reinforcement learning in linear embedding space unlocks generalizable control across soft robot configurations
HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM
Agent
Learning
🎯
Reinforcement Learning from Human Feedback
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for HIPIF: Hierarchical Planning and Information Folding for Long-Horizon LLM Agent Learning
Failure Modes of
Deep
Multi-Agent
RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix
🎯
Reinforcement Learning from Human Feedback
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix
Self-Paced Curriculum
Reinforcement
Learning
for Autonomous Superbike Racing in Simulation
🎯
Reinforcement Learning from Human Feedback
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation
Flow-DPPO: Divergence Proximal
Policy
Optimization for Flow Matching Models
🎯
Reinforcement Learning from Human Feedback
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models
ARTA: Adaptive
Reinforcement-Learning-Based
Throttling Agent for RowHammer Vulnerabilities
🛠️
Systems Programming
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for ARTA: Adaptive Reinforcement-Learning-Based Throttling Agent for RowHammer Vulnerabilities
Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using
Reinforcement
Learning
🎯
Reinforcement Learning from Human Feedback
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning
SHAPO: Sharpness-Aware
Policy
Optimization for Safe
Exploration
🎯
Reinforcement Learning from Human Feedback
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
UNIQ: Conformal Calibration for Adaptive Conservatism in Offline
Reinforcement
Learning
🎯
Reinforcement Learning from Human Feedback
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning
Reasoning or Memorization? Direction-Aware Diversity
Exploration
in LLM
Reinforcement
Learning
🎯
Reinforcement Learning from Human Feedback
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Reasoning or Memorization? Direction-Aware Diversity Exploration in LLM Reinforcement Learning
Representation-Aware Advantage Estimation: Your
Reward
Model Provides More Than A Scalar Output
🎯
Reinforcement Learning from Human Feedback
Content type:
Academic
arxiv.org
·
22h
22 hours ago
Actions for Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help