Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 242 posts in 8.0 ms

Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix

馃AI AgentsContent type: Academic
arxiv.org
Less-relevant results

Best explanations of how LLMs work

馃LLMsContent type: Blog

Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

馃RoboticsContent type: Academic
arxiv.org

Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication

馃AI AgentsContent type: Academic
arxiv.org

UniIntervene: Agentic Intervention for Efficient Real-World Reinforcement Learning

馃AI AgentsContent type: Academic
arxiv.org

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

馃敩Deep LearningContent type: Code
github.comHacker News

UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

馃幆Fine-tuningContent type: Academic
arxiv.org

Generalization Hacking: Models Can Game Reinforcement Learning by Preventing Behavioral Generalization

馃AI AgentsContent type: Academic
arxiv.org

Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation

馃AI AgentsContent type: Academic
arxiv.org

Show HN: The Deterministic Core Architecture for AI-Augmented Applications

馃獰Context Windows

Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning

馃RoboticsContent type: Academic
arxiv.org

SVoT: State-aware Visualization-of-Thought for Spatial Reasoning via Reinforcement Learning

馃尡Digital GardensContent type: Academic
arxiv.org

GIFT: LLM-Guided State-Reward Interface for Financial Reinforcement Learning

馃LLMContent type: Academic
arxiv.org

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

馃AI AgentsContent type: Academic
arxiv.org

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

馃LLM InferenceContent type: NewsContent type: Blog

Deep reinforcement learning for process design: Review and perspective

馃敩Deep LearningContent type: Academic
arxiv.org

APPO: Agentic Procedural Policy Optimization

馃AI AgentsContent type: Academic
arxiv.org

Introducing the Third Generation of Apple鈥檚 Foundation Models

馃LLMs

Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach

馃AI AgentsContent type: Academic
arxiv.org

A Unifying Lens on Reward Uncertainty in RLHF

馃LLMContent type: Academic
arxiv.org

No more posts from saeedesmaili's subscribed feeds.

Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help