Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 432 posts in 12.1 ms

Beyond Uniform Token-Level Trust Region in LLM Reinforcement Learning

馃LLMsContent type: Academic
arxiv.org

Path Planning Using Deep Deterministic Policy Gradient: A Reinforcement Learning Approach

馃敟PyTorchContent type: Academic
arxiv.org

Representation Learning Enables Scalable Multitask Deep Reinforcement Learning

馃Deep LearningContent type: Academic
arxiv.org

Bellman-Taylor Score Decoding for Markov Decision Processes with State-Dependent Feasible Action Sets

馃挰Prompt EngineeringContent type: Academic
arxiv.org

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

馃摑NLPContent type: Academic
arxiv.org

Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation

馃搱OptimizationContent type: Academic
arxiv.org

Enhancing the MADDPG Algorithm for Multi-Agent Learning via Action Inference and Importance Sampling

馃LLMsContent type: Academic
arxiv.org

Progress-SQL: Improving Reinforcement Learning for Text-to-SQL via Progressive Rewards

馃悩PostgreSQLContent type: Academic
arxiv.org

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

馃LLMsContent type: Academic
arxiv.org

On Advantage Estimates for Max@K Policy Gradients

馃搱OptimizationContent type: Academic
arxiv.org

Reformulate LLM Reinforcement Learning for Efficient Training under Black-box Discrepancy

馃LLMsContent type: Academic
arxiv.org

On-sky demonstration of reinforcement learning for adaptive optics control

馃Machine LearningContent type: Academic
arxiv.org

Belief-Space Quantum-Inspired Reinforcement Learning for Partially Observable Autonomous Cyber Defense in the Internet of Vehicles

馃敀Network SecurityContent type: Academic
arxiv.org

Reasoning or Memorization? Direction-Aware Diversity Exploration in LLM Reinforcement Learning

馃挰Prompt EngineeringContent type: Academic
arxiv.org

COP-Q: Safety-First Reinforcement Learning for Robot Control via Cholesky-Ordered Projection

馃幁Anthropic ClaudeContent type: Academic
arxiv.org

QnRL: Quantum-Native Reinforcement Learning

馃OllamaContent type: Academic
arxiv.org

Constrained Deep Reinforcement Learning for Cognitive Radar Resource Management

馃Deep LearningContent type: Academic
arxiv.org

Reinforcement Learning for Flow-Matching Policies with Density Transport

馃搱OptimizationContent type: Academic
arxiv.org

Cooperative Long Rope Skipping via Multi-Agent Reinforcement Learning

馃AIContent type: Academic
arxiv.org

Trace-Mediated Peak Bias: Bridging Temporal Credit Assignment and Cognitive Heuristics in Deep Reinforcement Learning

馃搱OptimizationContent type: Academic
arxiv.org

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help