Activate Gemma 4 MTP (opens in new tab)
Update llama first, tried with llama b9704 llama serve -m \"\.\.\gemma-4-12B-it-qat\gemma-4-12B-it-qat-UD-Q4\_K\_XL\.gguf\" --spec-type draft-mtp --spec-draft-n-max 2 --spec-draft-model \"\.\.\gemma-4-12B-it-qat\mtp-gemma-4-12B-it\.gguf\" Using MTP increases performance from 10 to 14\.8 tokens per second\. Someone recommends trying --spec-draft-n-max 3 for coding workloads
Read the original article