Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine re...

Sign in to keep reading the full article.

Covered in 1 article

In other languages

ai-brief.liziran.com·

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling (opens in new tab)

Covered in 1 article

In other languages

两次循环让SWE-bench从43涨到64