Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 299 posts in 11.5 ms

Reinforcement Learning from Rich Feedback with Distributional DAgger

馃搱OptimizationContent type: Academic
arxiv.org

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

馃搱OptimizationContent type: Academic
arxiv.org

Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents

馃敳Cellular AutomataContent type: Academic
arxiv.org

AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning

馃挰Prompt EngineeringContent type: Academic
arxiv.org

Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents

馃LLMsContent type: Academic
arxiv.org

A Goal-Set Characterization of Task Composition in the Boolean Task Algebra

馃LLMsContent type: Academic
arxiv.org

Online KL-Regularized Reinforcement Learning with Function Approximation under Misspecification

馃搱OptimizationContent type: Academic
arxiv.org

Read the Trace, Steer the Path: Trajectory-Aware Reinforcement Learning for Diffusion Language Models

馃LLMsContent type: Academic
arxiv.org

Alpha-RTL: Test-Time Training for RTL Hardware Optimization

馃LLMsContent type: Academic
arxiv.org

Position: Deployed Reinforcement Learning should be Continual

馃敳Cellular AutomataContent type: Academic
arxiv.org

Semi-Offline Reinforcement Learning for Optimized Text Generation

馃LLMsContent type: Academic
arxiv.org

SALT: When More Rollouts Don't Help in Group-Based Policy Optimization and How to Make Them Matter

馃LLMsContent type: Academic
arxiv.org

Maximising the Set-Piece Return: Optimising Football Corner Tactics with Graph Reinforcement Learning

馃搱OptimizationContent type: Academic
arxiv.org

Large Language Models Hack Rewards, and Society

馃Machine LearningContent type: Academic
arxiv.org

Transformer-Enhanced Reinforcement Learning: Fundamentals and Applications in Communication Networks

馃TransformersContent type: Academic
arxiv.org

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

馃AIContent type: Academic
arxiv.org

No more posts from jyunzhang's subscribed feeds.

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help