Can activation verbalizers surface an internal chain of thought? (opens in new tab)
We introduce an evaluation for activation verbalizers: can they surface a target model's reasoning as it solves a math problem in a single forward pass? For open-weight NLAs, the answer seems to be: "possibly, but definitely not reliably".Lots of important capabilities currently require AI models to reason "out loud" in a natural-language chain of thought, which means that we can Some interpretability tools might offer such an affordance. In particular, an activation verbalizer (AV) takes a r...
Read the original article