Weak-To-Strong Generalization
lesswrong.com·6h
Flag this post

Published on November 2, 2025 2:45 AM GMT

I will be discussing weak-to-strong generalization with Sahil on Monday, November 3rd, 2025, 11am Pacific Daylight Time. You can join the discussion with this link.

Weak-to-strong generalization is an approach to alignment (and capabilities) which seeks to address the scarcity of human feedback by using a weak model to teach a strong model. This is similar to Paul Christiano’s iterated distillation and amplification (IDA), but without the “amplification” step: the strong model is trained …

Similar Posts

Loading similar posts...