Google 发布了 Gemma 4 的一个新模型，12B 参数，看介绍不是 MoE 。看 HF 和 Kaggle 上都是 BF16 数据类型，权重文件大小 23\.9GB 左右。 Google 在博客里专门强调了 Laptop ready: Small enough to run locally with just 16GB of VRAM or unified memory\. 这是怎么做到能在 16G 显存上跑的？还是说 BF16 的不能跑，要 FP8 量化的才行？但这种量化之后能在 16G 卡上跑的模型很多了，还有很多参数量更大的模型。

Read the original article