Scalable End-to-End Interpretability
lesswrong.com·2h
🌳Tree-sitter
Preview
Report Post

Published on December 18, 2025 10:37 PM GMT

This is partly a linkpost for Predictive Concept Decoders, and partly a response to Neel Nanda’s Pragmatic Vision for AI Interpretability and Leo Gao’s Ambitious Vision for Interpretability.

There is currently somewhat of a debate in the interpretability community between pragmatic interpretability—grounding problems in empirically measurable safety tasks—and ambitious interpretability––obtaining a full bottom-up understand…

Similar Posts

Loading similar posts...