🎓 RLHF - SeanNg · Scour

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

🎮Reinforcement Learning Academic

Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond

🎮Reinforcement Learning

turingpost.com·

Tracing Eval-Awareness Emergence Through Training of OLMo 3

🎯Fine-tuning

lesswrong.com·

Less-relevant results

ApodexAI/AgentHarness: Evaluation harness for Apodex-1.0 on public deep-research benchmarks.

🧠OpenAI Code

github.com··Hacker News

Nvidia Nemotron 3 Ultra

research.nvidia.com··Hacker News

Researchers trained an open source AI search agent, Harness-1, that outperforms GPT-5.4 on recalling relevant information

🎮Reinforcement Learning

venturebeat.com··Hacker News

Why Claude Produces High-Quality Output: A Developer’s Guide to Token Efficiency and Hallucination…

🎭Anthropic Claude Blog

SFT & the Locus Awards

sfintranslation.com·

DiffusionGemma: The Developer Guide

🎯Fine-tuning Blog

developers.googleblog.com·

From 1 July, the AP will check the registration of scan cars in the algorithm register

🎯Fine-tuning

autoriteitpersoonsgegevens.nl·

How to reduce capability degradation from off-model SFT

✍️Prompt Engineering

lesswrong.com·

local AI agents for Cursor with pre-tuned marketplace/commu

locaible.com··Hacker News

Mult-DPO: Multinomial Direct Preference Optimization for Recommender Systems

🎯Fine-tuning Academic

Stack Overflow didn't just help AI learn to code

zozo123.github.io··Hacker News

AI2's Nathan Lambert says Nvidia's multi-teacher on-policy distillation for Nemotron 3 Ultra is the post-training industry standard

Why LLMs (still) lack taste

beyondtheprior.com··Hacker News

GDPR request

🎯Fine-tuning

wiki.openfoodfacts.org·

Vibe Diaries: Training Nanochat

🔤Tokenization

vibediary.dev··Hacker News

Deep Learning Weekly: Issue 458

deeplearningweekly.com·

A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design

🤖LLM Academic

Log in to enable infinite scrolling