Generalization Dynamics of LM Pre-Training (opens in new tab)

Covers 3 stories including Project Glasswing: Securing critical software for the AI eraCovered by tldr.techDiscussed on Hacker News

People typically assume that LMs stably mature from pattern-matching parrots to generalizable intelligence during pre-training. We build a toy eval suite and show this mental model is wrong: throughout pre-training, LMs frequently and suddenly hop between parrot-like and intelligence-like modes, i.e. distinct algorithms implemented by distinct circuits. We call this mode-hopping. Across our suite, LMs can suddenly latch onto memorized or in-context patterns instead of in-context learning, use...

Read the original article

Sign in to keep reading the full article.

Sign Up Log In

Covered in 1 article

tldr.tech·

Covered in 1 article

Qwen 3.7 🤖, Cursor Composer 2.5 👨‍💻, Anthropic acquires Stainless 🛠️