Benchmarking llama.cpp's brand-new MTP support on Strix Halo (opens in new tab)
After llama.cpp merged Multi-Token Prediction (MTP) speculative decoding support, I benchmarked Qwen3.6 27B and 35B-A3B on Strix Halo and an RTX 3090. Up to 2.44× speedup, lossless output, build-from-source steps included.
Read the original article