llama + spec: MTP Support by am17an · Pull Request #22673 (opens in new tab) 🛡️Guardrails
Overview This PR adds support for MTP (Multi Token Prediction) heads. I tested this on Qwen3.6 27B but in principle it should work for any MTP model. I've posted the detailed results below, bu...
Read the original article