Overview This PR adds support for MTP (Multi Token Prediction) heads. I tested this on Qwen3.6 27B but in principle it should work for any MTP model. I've posted the detailed results below, bu...

Sign in to keep reading the full article.

Covered in 10 articles

Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 80.2 tok/s)

dev.to··DEV

MTP Isn't Always a Win: 1.95x on My 3090, but Speculative Decoding Is Hardware-Dependent

bric.pe.kr··DEV

This Month in Agentic Coding – May 2026

agenticcodingweekly.com··Hacker News, Hacker News

View all 10 ›