Don't let the LLM speak, just probe it (8 minute read) (opens in new tab) 🤖LLMs
TL;DR: When an LLM reads "here's some text, here's a criterion — does it satisfy it?", the answer often already exists in its hidden state before it generates a single token. So skip generation entirely: grab the hidden state at the last prompt token (~70% of the way up the model's layers), feed it to a tiny MLP, calibrate the output. Because the training data varies the criterion, you get one frozen model that acts as any classifier you can write in English.
Read the original article