Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Reinforcement Learning
🎮 Reinforcement Learning
RL, RLHF, reward model, policy gradient
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
477
posts in
5.9
ms
Essential role of self-interaction correction in single-atom catalysis: From electronic structure to activity predictions
📐
Scaling Laws
link.aps.org
·
1d
1 day ago
Actions for Essential role of self-interaction correction in single-atom catalysis: From electronic structure to activity predictions
I got so mad at poke(rogue)like that I trained a
RL
agent to beat it for me
⚙️
Model Training
thiagolira.blot.im
·
3d
3 days ago
·
Hacker News
Actions for I got so mad at poke(rogue)like that I trained a RL agent to beat it for me
Deep
Reinforcement
Learning
for Adaptive Power Allocation in ISAC Systems with Mobile Target
🔄
Transformers
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Deep Reinforcement Learning for Adaptive Power Allocation in ISAC Systems with Mobile Target
Reinforcement
learning
in linear embedding space unlocks generalizable control across
soft
robot configurations
📐
Scaling Laws
Content type:
Academic
nature.com
·
3d
3 days ago
Actions for Reinforcement learning in linear embedding space unlocks generalizable control across soft robot configurations
Reinforcement
Learning
Disrupts
Gradient-Based
Adversarial Optimization
📉
Deep Learning
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization
Deterministic
Policy
Gradient
for
Learning
Equilibrium in Time-Inconsistent Control Problems
📈
Quantitative Finance
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Deterministic Policy Gradient for Learning Equilibrium in Time-Inconsistent Control Problems
KJLdefeated/RL.cu
: RLVR training for LLM in CUDA/C++
🖥️
ML Systems
Content type:
Code
github.com
·
4d
4 days ago
·
Hacker News
Actions for KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++
Generalization Hacking:
Models
Can Game
Reinforcement
Learning
by Preventing Behavioral Generalization
🔄
Transformers
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Generalization Hacking: Models Can Game Reinforcement Learning by Preventing Behavioral Generalization
UniIntervene: Agentic Intervention for Efficient Real-World
Reinforcement
Learning
🤖
AI Agents
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning
Harnessing Routing Foresight for Micro-step-level MoE load balancing in
RL
Post-training
⚙️
Model Training
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Harnessing Routing Foresight for Micro-step-level MoE load balancing in RL Post-training
Phi-Actor-Critic
: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria
🤖
AI Agents
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Phi-Actor-Critic: Steering General-Sum Games to Pareto-Efficient Correlated Equilibria
Mitigating Bias in Low-SNR Financial
Reinforcement
Learning
via Quantum Representations
📈
Quantitative Finance
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Mitigating Bias in Low-SNR Financial Reinforcement Learning via Quantum Representations
APPO: Agentic Procedural
Policy
Optimization
🤖
AI Agents
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for APPO: Agentic Procedural Policy Optimization
Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using
Reinforcement
Learning
📉
Deep Learning
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning
Seeing Before Colliding: Anticipatory Safe
RL
with Frozen Vision-Language
Models
🔍
Interpretability
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Seeing Before Colliding: Anticipatory Safe RL with Frozen Vision-Language Models
A Unifying Lens on
Reward
Uncertainty in
RLHF
⚙️
Model Training
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for A Unifying Lens on Reward Uncertainty in RLHF
Plan-and-Verify Video
Reward
Reasoning with Spatio-Temporal Scene Graph Grounding
🔄
Transformers
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Plan-and-Verify Video Reward Reasoning with Spatio-Temporal Scene Graph Grounding
Fast and Highly Expressive
Policy
Learning
for Offline
Reinforcement
Learning
via Bootstrapped Flow
Q-Learning
🧠
AI Research
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning
Critic
Architecture Matters: Dual vs. Unified Critics for Humanoid Loco-Manipulation
⚙️
Model Training
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Critic Architecture Matters: Dual vs. Unified Critics for Humanoid Loco-Manipulation
Performance Variation in
Deep
Reinforcement
Learning
📉
Deep Learning
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for Performance Variation in Deep Reinforcement Learning
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help