Reinforcement Learning

Feeds to Scour
SubscribedAll
Scoured 245 posts in 8.0 ms

Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information

馃敘vector embedding
venturebeat.comHacker News

How to reduce capability degradation from off-model SFT

馃敀Hardware Security
lesswrong.com

Deep Reinforcement Learning for Adaptive Power Allocation in ISAC Systems with Mobile Target

馃TransformersContent type: Academic
arxiv.org

SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.

馃攳Symbolic ExecutionContent type: Code
github.comr/opensource

Import AI 460: Reward hacking society, RSI data from Anthropic; and RL-based quadcopter racing

馃攳Symbolic ExecutionContent type: NewsContent type: Blog

AI-powered living business intelligence network

馃Transformers
atlasforgex.com
Hacker News

Multi-Agent Reinforcement Learning from Delayed Marketplace Feedback for Objective-Weight Adaptation in Three-Sided Dispatch

馃實Distributed SystemsContent type: Academic
arxiv.org

Reinforcement Learning Disrupts Gradient-Based Adversarial Optimization

馃TransformersContent type: Academic
arxiv.org

SENTINEL: Failure-Driven Reinforcement Learning for Training Tool-Using Language Model Agents

馃攳Symbolic ExecutionContent type: Academic
arxiv.org

Reinforcement Learning for Neural Model Editing

馃TransformersContent type: Academic
arxiv.org

Stubborn: A Streamlined and Unified Reinforcement Learning Framework for Robust Motion Tracking and Fall Recovery for Humanoids

馃攳Symbolic ExecutionContent type: Academic
arxiv.org

QoS Improvement in Multi User Cellular-Symbiotic Radio Network Assisted by Active-STAR-RIS

馃實Distributed SystemsContent type: Academic
arxiv.org

HERO: Hindsight-Enhanced Reflection from Environment Observations for Agentic Self-Distillation

馃攳Symbolic ExecutionContent type: Academic
arxiv.org

When Context Returns: Toward Robust Internalization in On-Policy Distillation

馃TransformersContent type: Academic
arxiv.org

Keep Policy Gradient in Charge: Sibling-Guided Credit Distillation for Long-Horizon Tool-Use Agents

馃攳Symbolic ExecutionContent type: Academic
arxiv.org

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

馃TransformersContent type: Academic
arxiv.org

Seeing Before Colliding: Anticipatory Safe RL with Frozen Vision-Language Models

馃攳Symbolic ExecutionContent type: Academic
arxiv.org

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

馃攳Symbolic ExecutionContent type: Academic
arxiv.org

TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution

馃攳Symbolic ExecutionContent type: Academic
arxiv.org

Analyzing and Improving Fine-grained Preference Optimization in Medical LVLMs

馃敘vector embeddingContent type: Academic
arxiv.org
Sign up or log in to see more results

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help