Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good (opens in new tab)
Last Updated on June 8, 2026 by Editorial Team Author(s): Chew Loong Nian – AI ENGINEER Originally published on Towards AI. Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn't Be This Good A 26-billion-parameter model has no business fitting in 15GB of memory and spitting out 193 tokens a second on a single consumer GPU. That is laptop-and-gaming-rig territory, not a datacenter. Yet that is exactly what Google’s new Gemma 4 QAT chec...
Read the original article