🎮 Reinforcement Learning - NonagonGUZZLER · Scour

Mechanistic Analysis of Alignment Algorithms in Language Models

🔄Transformers Academic

On Advantage Estimates for Max@K Policy Gradients

🤖ai Academic

Self-Paced Curriculum Reinforcement Learning for Autonomous Superbike Racing in Simulation

🤖ai Academic

SocraticPO: Policy Optimization via Interactive Guidance

🤖ai Academic

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

🤖ai Academic

Retry Policy Gradients in Continuous Action Spaces

🤖ai Academic

Quantum-Inspired Reinforcement Learning for Low-Latency Intrusion Detection in V2X and Internet-of-Vehicles Networks

🤖ai Academic

Reformulate LLM Reinforcement Learning for Efficient Training under Black-box Discrepancy

🔄Transformers Academic

From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments

🤖ai Academic

Merging model-based control with multi-agent reinforcement learning for multi-agent cooperative teaming strategies

🤖ai Academic

Learning to Attack and Defend: Adaptive Red Teaming of Language Models via GRPO

🔄Transformers Academic

One Lens, Many Worlds : A Capability-Typed Interface for World-Model Interpretability

🔄Transformers Academic

Rethinking the Divergence Regularization in LLM RL

🤖ai Academic

Drag reduction or reward hacking? Recurrent multi-agent reinforcement learning that earns its reward

🤖ai Academic

COP-Q: Safety-First Reinforcement Learning for Robot Control via Cholesky-Ordered Projection

🤖ai Academic

LogNEO: A GPT-Neo Reinforcement Learning Framework for Accurate Real-Time Log Anomaly Detection

🤖ai Academic

OrderGrad: Optimizing Beyond the Mean with Order-Statistic Policy Gradient Estimation

🤖ai Academic

Learn to Match: Two-Sided Matching with Temporally Extended Feedback

🔄Transformers Academic

Selective-Advantage Entropy-Adaptive Horizon GRPO: Asymmetric Token-Level Discounting for Efficient Reinforcement Learning of Language Models

🔄Transformers Academic

Q-VGM: Q-Guided Value-Gradient Matching for Flow-Matching VLA Policies

🤖Machine Learning Academic

Sign up or log in to see more results

Log in to enable infinite scrolling