🎯 Reinforcement Learning - daemsc · Scour

Policy Gradient for Continuous-Time Robust Markov Decision Processes

🤖AI Engineering Academic

Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…

🤖AI Engineering Blog

·

Researchers develop AI-powered railway control system for efficient urban train operation

🤖AI Engineering

techxplore.com·

Agents Need Work Data: A Primer on RLWD, or Reinforcement Learning on Work Data

🤖AI Engineering

anjalishriva.com··Hacker News

SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.

🤖AI Engineering Code

github.com··r/opensource

Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI

🤖AI Engineering Blog

aws.amazon.com·

Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)

🤖Robotics Academic

web.mit.edu··Hacker News

AI Agent Mastery & Coaching

🤖AI Engineering

Some Interesting Papers on RLVR

🧠LLM Research

lesswrong.com·

Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond

🧠LLM Research

turingpost.com·

Reinforcement learning in linear embedding space unlocks generalizable control across soft robot configurations

🤖Robotics Academic

Cohere open-sources a coding agent that runs on a single H100

🧠LLM Research

venturebeat.com·

DDPG from Scratch: 400-Line PyTorch Implementation

🧠LLM Research

Social intelligence Arises Between Minds

psychologytoday.com·

Good teachers don’t cheat

🛡️AI Safety Blog

jasonkena.github.io··Hacker News

I got so mad at poke(rogue)like that I trained a RL agent to beat it for me

🤖AI Engineering

thiagolira.blot.im··Hacker News

Microsoft just shared the frontier data engineering secrets

🔮Multimodal AI

mail.bycloud.ai·

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

🤖AI Engineering Academic

A Human-Augmenting Agentic Workflow for Causal Inference

🤖AI Engineering Blog

netflixtechblog.medium.com·

See, Act, Correct: three levers for working with a code agent

🧠LLM Research Blog

blog.owulveryck.info··Hacker News, Hacker News

Log in to enable infinite scrolling