Fast and Accurate Causal Parallel Decoding using Jacobi Forcing
hao-ai-lab.github.io·1d
📊Quantization
Preview
Report Post

TL;DR: Today’s Best LLMs mostly decode autoregressively from left-to-right, which gives great quality but is terribly slow. Diffusion LLM can decode many tokens in parallel thanks to their non-casual, any-order generation, but they must be trained from scratch, or heavily adapted from autoregressive (AR) checkpoints with a non-casual diffusion objective; we find this mismatch often hurts quality and breaks many effective KV-cache related serving optimizations. This blog introduces Jacobi Forcing, a new training technique that converts LLMs into native casual parallel decoders. Jacobi forcing keeps the casual AR backbone and fixes the AR-to-diffusion mismatch by training the model to handle noisy future blocks along its own Jacobi decoding trajectories. This yields an AR model whi…

Similar Posts

Loading similar posts...