Most large language models predict answers by guessing the single best word to say next, then the next, and so on... 🔎 (opens in new tab)

Most large language models predict answers by guessing the single best word to say next, then the next, and so on... 🔎 It's highly capable, but not necessarily fast. The model waits to finish one word before it can think about the next. DiffusionGemma skips the wait. It uses "diffusion" to generate text by refining noise step by step — drafting and error-correcting whole blocks simultaneously. This makes it incredibly fast, and helpful for editing complex math and code. Video

Read the original article