I was curious why MTP affects PP TPS in llama.cpp. My PoC recovers it? (opens in new tab)
I've been running Qwen3.6-35B-A3B locally on llama.cpp and noticed that prompt processing throughput gets too low with MTP. I got nerd-sniped.
Read the original article