Back to article

not much happened today | AINews (opens in new tab)

Covers 7 stories including Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

Covers 7 related stories

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

Discussed on Hacker News

ggerganov/llama.cpp

Discussed on r/LocalLLaMA and DEV

KVarN: Variance-Normalized KV-Cache Quantization Mitigates Error Accumulation in Reasoning Tasks

huawei-csl/KVarN: KVarN is a native vLLM KV-cache quantization backend for your agents: 3-5x more context, throughput above FP16, and FP16-level accuracy. Calibration-free, one flag.

Discussed on Hacker News

huggingface.co·

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16

Discussed on Hacker News, Hacker News, and r/LocalLLaMA

Unsloth Gemma 4 QAT

A First Comprehensive Study of TurboQuant: Accuracy and Performance

Discussed on r/LocalLLaMA