🎮 Reinforcement Learning - buckman

🤖AI Academic

web.mit.edu··Hacker News

Some Interesting Papers on RLVR

📐Linear Algebra

lesswrong.com·

Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…

🎯RLHF Blog

medium.com

Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI

🟩Nvidia Blog

aws.amazon.com·

Deep Learning Weekly: Issue 458

🤖Large Language Models

deeplearningweekly.com·

SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.

📊Data Visualization Code

github.com··r/opensource

Prompt Injection Defense Pipeline

🛡️LLM Security

emergentmind.com·

Direct Preference Optimization Beyond Chatbots

🎯RLHF Blog

huggingface.co··Hacker News

Location: Göttingen, Germany Remote: Yes (preferred; hybrid also fine) Willing t...

🤖Large Language Models Discussion

news.ycombinator.com··Hacker News

Good teachers don’t cheat

🧮Complexity Theory Blog

jasonkena.github.io··Hacker News

AI Paper Review: Training Language Models to Follow Instructions with Human Feedback (InstructGPT)

🤖GenAI

freecodecamp.org·

SLUUG Talk: Demystifying Large Language Models on Linux

🤖GenAI Code

github.com··DEV

DDPG from Scratch: 400-Line PyTorch Implementation

🤖AI

tildalice.io·

Nvidia Nemotron 3 Ultra

🤖AI

research.nvidia.com··Hacker News

Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

🧠LLM

lesswrong.com·

The Sycophancy Problem: Why AI Can’t Stop Agreeing With You

🤖AI

moroccoworldnews.com·

umair-tareen/philosopher-council: An eleven-philosopher LLM council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.

🏛️Philosophy Code

github.com··r/SideProject

My research agenda and work

🤖AI

lesswrong.com·

Improve your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI

🎯RLHF Blog

aws.amazon.com·

Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond

Reinforcement Learning and Optimal Control Book (RIP Dimitri Bertsekas)

Some Interesting Papers on RLVR

Q-Learning (Reinforcement learning): Bellman Equation, Markov Decision Processes, Q-Values, and…

Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI

Deep Learning Weekly: Issue 458

SimarcLabs/pybullet-swarm-sim: Python framework for simulating drone swarms with PyBullet in seconds.

Prompt Injection Defense Pipeline

Direct Preference Optimization Beyond Chatbots

Location: Göttingen, Germany Remote: Yes (preferred; hybrid also fine) Willing t...

Good teachers don’t cheat

AI Paper Review: Training Language Models to Follow Instructions with Human Feedback (InstructGPT)

SLUUG Talk: Demystifying Large Language Models on Linux

DDPG from Scratch: 400-Line PyTorch Implementation

Nvidia Nemotron 3 Ultra

Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

The Sycophancy Problem: Why AI Can’t Stop Agreeing With You

umair-tareen/philosopher-council: An eleven-philosopher LLM council - ask it questions or point it at AI-research trends. Claude-powered deliberation through the four classical branches of philosophy. Methodology, not metaphysics.

My research agenda and work

Improve your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI