⚙ post training infra - moyutianzun · Scour

Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond

turingpost.com·

Which LoRA? An Empirical Study on the Effectiveness of LoRA Techniques During Multilingual Instruction Tuning

🎛️Fine-Tuning Academic

Fine-tuning Large Language Models (LLMs) using PEFT

🎛️Fine-Tuning Blog

·

Tracing Eval-Awareness Emergence Through Training of OLMo 3

lesswrong.com·

brunokeymolen/lora: LoRa (Long Range) communication related projects

🎛️Fine-Tuning Code

github.com··Hacker News

Meshcore and Haiku: a Match Apparently Made in Italy

🎛️Fine-Tuning

Less-relevant results

[NEW MODEL] SupraLabs just released Supra1.5-50M Base (Experimental)!

🎛️Fine-Tuning

huggingface.co··r/LocalLLaMA

SecLoRA: Secure Aggregation of Low-Rank Matrix Products via Functional Encryption

🎛️Fine-Tuning

eprint.iacr.org·

The week AI infrastructure crossed from a technology story to a financial one

🎯RLHF News

Robust Multi-Mutant Protein Stability Prediction from a Fine-Tuned Evolutionary Scale Model

🎛️Fine-Tuning Academic

New comment by bkjlblh in "Claude Fable 5"

🎛️Fine-Tuning Discussion

news.ycombinator.com··Hacker News

fc2

🎛️Fine-Tuning

Anthropic Apologizes For One of the Guardrails on Its Fable 5 Model, and Will Change It

🎛️Fine-Tuning

Replicate vs Gemini API: An Honest Cost Breakdown of Photo Generation (Real Production Numbers)

🎛️Fine-Tuning Blog

Introducing the Google Colab CLI

🎛️Fine-Tuning Blog

developers.googleblog.com·

If Claude Fable stops helping you, you’ll never know

🎛️Fine-Tuning

simonwillison.net··Hacker News

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

🎛️Fine-Tuning Academic

Anthropic releases Claude Fable 5 and Mythos 5 with major gains in coding and science

🎛️Fine-Tuning

Google Colab CLI opens runtimes to Claude Code and Codex

🎛️Fine-Tuning

helpnetsecurity.com··r/ClaudeAI

Why LLMs (still) lack taste

beyondtheprior.com··Hacker News

Log in to enable infinite scrolling