Tyler: Typed Latent Reasoning for Language Models -- When to Think, What to Compute, and How Much to Allocate (opens in new tab)
Chain-of-thought (CoT) prompting improves reasoning in large language models (LLMs) by externalizing intermediate computation as discrete text tokens, but this textual interface also introduces redundancy and inference overhead. Latent reasoning offers a promising alternative by carrying part of the computation in continuous representations. However, existing methods typically predefine when latent computation is invoked and how it is allocated ...
Read the original article