Back to article

Qwen3.6 MTP весит на 0.3 Гб больше, а даёт ускорение в ~2 раза. С 60 t/s до 130 t/s для Qwen3.6 27B без искажений (opens in new tab)

Covers 4 stories including Hugging Face – Fun chat with your own Artificial Intelligence

Covers 4 related stories

huggingface.co·

Hugging Face – Fun chat with your own Artificial Intelligence

Discussed on r/webdev and DEV

llama + spec: MTP Support by am17an · Pull Request #22673

Discussed on Hacker News and r/LocalLLaMA

huggingface.co·

Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version)

Discussed on r/LocalLLaMA

[Speculative decoding] feat: add EAGLE3 speculative decoding support by ichbinhandsome · Pull Request #18039 · ggml-org/llama.cpp

Discussed on r/LocalLLaMA and r/LocalLLaMA