Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
馃幃 Reinforcement Learning
Q-Learning, Policy Gradients, Environments, Rewards
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
127
posts in
5.9
ms
Test-Time
Gradient
Guidance of Flow
Policies
in
Reinforcement
Learning
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning
The Neutral Mask: How
RLHF
Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model
ARTA: Adaptive
Reinforcement-Learning-Based
Throttling Agent for RowHammer Vulnerabilities
聽
馃寪
Distributed Systems
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for ARTA: Adaptive Reinforcement-Learning-Based Throttling Agent for RowHammer Vulnerabilities
Performance Variation in Deep
Reinforcement
Learning
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
2d
2 days ago
Actions for Performance Variation in Deep Reinforcement Learning
Do We Want a Superintelligent People-Pleaser?
聽
馃
AI
lesswrong.com
路
5d
5 days ago
Actions for Do We Want a Superintelligent People-Pleaser?
Failure Modes of Deep Multi-Agent
RL
in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix
Less-relevant results
umair-tareen/philosopher-council: An eleven-philosopher LLM council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.
聽
馃
AI
聽
Content type:
Code
github.com
路
5d
5 days ago
路
r/SideProject
Actions for umair-tareen/philosopher-council: An eleven-philosopher LLM council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.
UNIQ: Conformal Calibration for Adaptive Conservatism in Offline
Reinforcement
Learning
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning
A Unifying Lens on
Reward
Uncertainty in
RLHF
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for A Unifying Lens on Reward Uncertainty in RLHF
SHAPO: Sharpness-Aware
Policy
Optimization for Safe
Exploration
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration
Multilingual Sentiment Aware Text Summarization A
Reinforcement
Learning
Approach for Consistency Maintenance
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance
Hidden Consensus:Preference-Validity Compression in Human Feedback
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for Hidden Consensus:Preference-Validity Compression in Human Feedback
My research agenda and work
聽
馃
AI
lesswrong.com
路
5d
5 days ago
Actions for My research agenda and work
A Regret Minimization Framework on Preference
Learning
in Large Language Models
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for A Regret Minimization Framework on Preference Learning in Large Language Models
Mechanistic Analysis of Alignment Algorithms in Language Models
聽
馃攢
Transformers
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for Mechanistic Analysis of Alignment Algorithms in Language Models
GIFT: LLM-Guided
State-Reward
Interface for Financial
Reinforcement
Learning
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for GIFT: LLM-Guided State-Reward Interface for Financial Reinforcement Learning
Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using
Reinforcement
Learning
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning
SocraticPO:
Policy
Optimization via Interactive Guidance
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
22h
22 hours ago
Actions for SocraticPO: Policy Optimization via Interactive Guidance
Structure-Conditioned
Actor-Critic
Branches for Quality-Diversity
Reinforcement
Learning
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
1d
1 day ago
Actions for Structure-Conditioned Actor-Critic Branches for Quality-Diversity Reinforcement Learning
Policy
Gradient
for Continuous-Time Robust
Markov
Decision Processes
聽
馃
AI
聽
Content type:
Academic
arxiv.org
路
6d
6 days ago
Actions for Policy Gradient for Continuous-Time Robust Markov Decision Processes
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help