🎯 Reinforcement Learning - elasticbounce · Scour

Performance Variation in Deep Reinforcement Learning

🧠Active Inference Academic

Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond

turingpost.com·

Researchers develop AI-powered railway control system for efficient urban train operation

🦾Bio inspired robotics

techxplore.com·

Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…

⚙️Computational Mechanics Blog

·

DDPG from Scratch: 400-Line PyTorch Implementation

🧠Synaptic pruning

SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.

🐝Collective Intelligence Code

github.com··r/opensource

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

🧠Active Inference Academic

Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)

⚙️Computational Mechanics Academic

web.mit.edu··Hacker News

Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

🧠Active Inference

lesswrong.com·

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

🧠Active Inference Academic

Good teachers don’t cheat

📡Information Theory Blog

jasonkena.github.io··Hacker News

Uncertainty-Aware LLM-Guided Policy Shaping for Sparse-Reward Reinforcement Learning

🧠Active Inference Academic

Geometrically Averaged Hard Target Updates for Linear Q-Learning

📐Information geometry Academic

SocraticPO: Policy Optimization via Interactive Guidance

🔄Continual Learning Academic

Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication

🌐Complex Systems Academic

TT-DAC-PS: Twin-Target Deterministic Actor-Critic with Policy Smoothing for Optimal Trade Execution

🧬Evolutionary Computation Academic

3SPO: State-Score-Supervised Policy Optimization for LLM Agents

🧬Evolutionary Computation Academic

Towards End to End Motion Planning and Execution for Autonomous Underwater Vehicles Using Reinforcement Learning

🔄Continual Learning Academic

SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration

🧠Active Inference Academic

Variational Proximal Policy Optimization

🧠Active Inference Academic

Log in to enable infinite scrolling