🎮 Reinforcement Learning - wxx · Scour

Variational Proximal Policy Optimization

💾Agent Memory Academic

Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond

turingpost.com·

Researchers develop AI-powered railway control system for efficient urban train operation

techxplore.com·

Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…

♟️Game Theory Blog

·

Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)

♟️Game Theory Academic

web.mit.edu··Hacker News

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

🤖AI Agents Academic

Geometrically Averaged Hard Target Updates for Linear Q-Learning

🧠LLMs Academic

The Neutral Mask: How RLHF Provides Shallow Alignment while Leaving Partisan Structure Intact in a Large Language Model

🧠LLMs Academic

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

🧠LLMs Academic

Multilingual Sentiment Aware Text Summarization A Reinforcement Learning Approach for Consistency Maintenance

🧠LLMs Academic

Policy Gradient for Continuous-Time Robust Markov Decision Processes

🧠LLMs Academic

A Unifying Lens on Reward Uncertainty in RLHF

🧠LLMs Academic

Discovering Interpretable Multi-Parameter Control Policies for Evolutionary Algorithms Using Deep Reinforcement Learning

🤖AI Agents Academic

A Regret Minimization Framework on Preference Learning in Large Language Models

🧠LLMs Academic

Geometry-Aware Reinforcement Learning for 2D Irregular Nesting

🤖AI Agents Academic

Self-Distilled Policy Gradient

📡Information Theory Academic

Development of COVID-19 Booster Vaccine Policy by Microsimulation and Q-learning

♟️Game Theory Academic

UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

🤝AI-Assisted Coding Academic

Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling

🧠LLMs Academic

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

🤖AI Agents Academic

Log in to enable infinite scrolling