⚙ post training infra - moyutianzun · Scour

Reasoning RL in 2026: GRPO, DPO, RLVR, Agentic PO & Beyond

turingpost.com·

Which LoRA? An Empirical Study on the Effectiveness of LoRA Techniques During Multilingual Instruction Tuning

🎛️Fine-Tuning Academic

Fine-tuning Large Language Models (LLMs) using PEFT

🎛️Fine-Tuning Blog

·

Tracing Eval-Awareness Emergence Through Training of OLMo 3

lesswrong.com·

brunokeymolen/lora: LoRa (Long Range) communication related projects

🎛️Fine-Tuning Code

github.com··Hacker News

Meshcore and Haiku: a Match Apparently Made in Italy

🎛️Fine-Tuning

SecLoRA: Secure Aggregation of Low-Rank Matrix Products via Functional Encryption

🎛️Fine-Tuning

eprint.iacr.org·

Fine-tune FLUX.2 [Klein] with a LoRA under 60 minutes

🎛️Fine-Tuning Blog

huggingface.co··Hacker News

Less-relevant results

The week AI infrastructure crossed from a technology story to a financial one

🎯RLHF News

New comment by bkjlblh in "Claude Fable 5"

🎛️Fine-Tuning Discussion

news.ycombinator.com··Hacker News

Anthropic Apologizes For One of the Guardrails on Its Fable 5 Model, and Will Change It

🎛️Fine-Tuning

Robust Multi-Mutant Protein Stability Prediction from a Fine-Tuned Evolutionary Scale Model

🎛️Fine-Tuning Academic

Replicate vs Gemini API: An Honest Cost Breakdown of Photo Generation (Real Production Numbers)

🎛️Fine-Tuning Blog

fc2

🎛️Fine-Tuning

Anthropic's Fable 5 Silent Sabotage Mode

🎛️Fine-Tuning

everettdutton.com··Hacker News

If Claude Fable stops helping you, you’ll never know

🎛️Fine-Tuning

simonwillison.net··Hacker News

Fine-tuning Multi-modal LLMs with ART: Art-based Reinforcement Training

🎛️Fine-Tuning Academic

Introducing the Google Colab CLI

🎛️Fine-Tuning Blog

developers.googleblog.com·

Anthropic releases Claude Fable 5 and Mythos 5 with major gains in coding and science

🎛️Fine-Tuning

Why LLMs (still) lack taste

beyondtheprior.com··Hacker News

Log in to enable infinite scrolling