Prompt processing vs. generation: two phases, opposite bottlenecks (opens in new tab)
The same machine can rip through generating tokens yet crawl when it reads a long prompt — or the reverse. That's because local LLMs run in two phases with opposite bottlenecks. Understand them and you'll know exactly which hardware spec to buy for.
Read the original article