🎯 Post-training - samveed · Scour

Emergence of Context Characteristics Sensitivity in Large Language Models

🌐World Models Academic

Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond

turingpost.com·

Sequent: scale and automation for higher confidence in alignment

lesswrong.com·

Why LLMs (still) lack taste

beyondtheprior.com··Hacker News

The week AI infrastructure crossed from a technology story to a financial one

💬LLMs News

Deep Learning Weekly: Issue 458

deeplearningweekly.com·

Introducing North Mini Code: Cohere’s First Model For Developers

🌐World Models Blog

huggingface.co··Hacker News

KJLdefeated/RL.cu: RLVR training for LLM in CUDA/C++

📊ML Code

github.com··Hacker News

Less-relevant results

Vibe Diaries: Training Nanochat

vibediary.dev··Hacker News

Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information

🌐World Models

venturebeat.com··Hacker News

Compatibility-Aware Dynamic Fine-Tuning for Large Language Models

🎮RL Academic

DiffusionGemma: The Developer Guide- Google Developers Blog

💬LLMs Blog

developers.googleblog.com··r/LocalLLaMA

NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

🌐World Models Blog

developer.nvidia.com··Hacker News

GPT-2: Too Dangerous To Release (2019)

💬LLMs Blog

naokishibuya.github.io··Hacker News

SFT & the Locus Awards

sfintranslation.com·

Tracing Eval-Awareness Emergence Through Training of OLMo 3

🏋️Pretraining

lesswrong.com·

I built a machine that turns AI papers into interactive explainers

🎮RL Blog

Architecture-Aware Reinforcement Learning Makes Sliding-Window Attention Competitive in Math Reasoning

🏋️Pretraining Academic

SLUUG Talk: Demystifying Large Language Models on Linux

🧠AI Code

github.com··DEV

AI2's Nathan Lambert says Nvidia's multi-teacher on-policy distillation for Nemotron 3 Ultra is the post-training industry standard

Log in to enable infinite scrolling