🎮 Reinforcement Learning - chris1 · Scour

Software Agents: The management challenge 🤖AI Agents

hypecycles.com·6d

Which one is more important: more parameters or more computation? (2021) 📐ML Theory

parl.ai·6d·Hacker News

Rule-based High-Level Coaching for Goal-Conditioned Reinforcement Learning in Search-and-Rescue UAV Missions Under Limited-Simulation Training 🤖AI Agents

Reddit as a Reinforcement Learning Gym for Persuasion 🤖Machine Learning

·6d

Uncertainty-Aware Reward Discounting for Mitigating Reward Hacking ♟️Game Theory

Fail safe(r) at alignment by channeling reward-hacking into a "spillway" motivation ♟️Game Theory

lesswrong.com·3d

K-Score: Kalman Filter as a Principled Alternative to Reward Normalization in Reinforcement Learning 📐ML Theory

On the Complexity of Robust Markov Decision Processes and Bisimulation Metrics ♟️Game Theory

Deep Policy Iteration for High-Dimensional Mean-Field Games with Regenerative Reformulation 📐ML Theory

Uncertainty-Aware Predictive Safety Filters for Probabilistic Neural Network Dynamics 📐ML Theory

When Errors Can Be Beneficial: A Categorization of Imperfect Rewards for Policy Gradient 🤖AI Agents

reward-lens: A Mechanistic Interpretability Library for Reward Models 📐ML Theory

Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control 💬LLMs

A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning 🤖AI Agents

Application of Deep Reinforcement Learning to Event-Triggered Control for Networked Artificial Pancreas Systems ♟️Game Theory

Entropy Centroids as Intrinsic Rewards for Test-Time Scaling 📐ML Theory

CAPSULE: Control-Theoretic Action Perturbations for Safe Uncertainty-Aware Reinforcement Learning ♟️Game Theory

DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training 💬LLMs

RL Token: Bootstrapping Online RL with Vision-Language-Action Models 💬LLMs

Co-Learning Port-Hamiltonian Systems and Optimal Energy-Shaping Control 📐ML Theory

Log in to enable infinite scrolling