TL;DR: When an LLM reads "here's some text, here's a criterion — does it satisfy it?", the answer often already exists in its hidden state before it generates a single token. So skip generation entirely: grab the hidden state at the last prompt token (~70% of the way up the model's layers), feed it to a tiny MLP, calibrate the output. Because the training data varies the criterion, you get one frozen model that acts as any classifier you can write in English.

Sign in to keep reading the full article.

Cited by 1 article

Dario Amodei policy 🏛️, DiffusionGemma ⚡, WhatsApp to unblock bots 🤖

tldr.tech·