🎮 Reinforcement Learning - buckman · Scour

Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond

turingpost.com·

Improve your agent’s tool-calling accuracy with SFT and DPO on Amazon SageMaker AI

🎯RLHF Blog

aws.amazon.com·

Some Interesting Papers on RLVR

📐Linear Algebra

lesswrong.com·

The Eval Gap: Your Agent Has Observability but No Idea If It's Any Good

🤖AI Blog

Deep Learning Weekly: Issue 458

🤖Large Language Models

deeplearningweekly.com·

You're doing it wrong

🚀Space Exploration News

understandably.com·

AI Paper Review: Training Language Models to Follow Instructions with Human Feedback (InstructGPT)

freecodecamp.org·

Import AI 460: Reward hacking society, RSI data from Anthropic; and RL-based quadcopter racing

🎭Anthropic Claude News Blog

importai.substack.com··Substack

SLUUG Talk: Demystifying Large Language Models on Linux

🤖GenAI Code

github.com··DEV

Local LLMs, Buy a GPU, and the Case for Cognitive Security

briefing.forwardfuture.ai·

Breaking free of a single datacenter: Practical geo-distributed AI operations with the k0smos platforms

☸️K8S Blog

Neglected Basics of AI Alignment

🛡️LLM Security

lesswrong.com·

How LLMs Actually Work: A Developer's Mental Model

🧠LLM Blog

Scale Robot Reinforcement Learning with NVIDIA Isaac Lab on Amazon SageMaker AI

🟩Nvidia Blog

aws.amazon.com·

Posting for authoring

turingpost.com·

Towards Shutdownable Agents: Generalizing Stochastic Choice in RL Agents and LLMs

lesswrong.com·

BAGEN: LLM Agents Waste 44% of Tokens on Tasks They'll Fail

🤖AI Blog

My research agenda and work

lesswrong.com·

LLM Fine-Tuning vs RAG: A Production Decision Framework for Engineering Teams

🧠LLM Training Blog

Human-Aligned Decision Transformers for satellite anomaly response operations with inverse simulation verification

🤖Large Language Models Blog

No more posts from buckman's subscribed feeds.

Scour all 25255 feeds Learn more about Feeds

Log in to enable infinite scrolling