Gemma 4 12B QAT (dense) achieves 1000+ tokens/sec prefill on 8GB VRAM with 120k context (opens in new tab)

Gemma 4 12B QAT (dense) achieves 1000+ tokens/sec prefill on 8GB VRAM with 120k context Gemma 4 12B QAT (dense), TurboQuant (Without MTP), RTX 4060 8GB VRAM: Prefill: 1000+ tok/s (42% increase) Decode: 25+ tok/s (25% increase) Context: 120k (150% increase) prefill was 700 tok/sec and decode 20 tok/sec with only 48k context without turbo quant (older test with mtp link in the comments) llama.cpp TurboQuant flags: -m gemma-4-12B-it-qat-UD-Q4_K_XL.gguf -c 120000 --cache-type-k q8_0 --cache-type-...

Read the original article
Sign in to keep reading the full article.

Keyboard Shortcuts

Navigation

Next / previous post
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Discover
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help