🗜️ Quantization - bugrakadirhan · Scour

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

⚡ML Inference

vettedconsumer.com··Hacker News

Understanding Quantization-Aware Training: Gradients at Quantized Weights Bias to the Low-Loss Basin

🧠Deep Learning Academic

Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM

everylocalai.com··DEV

Linux 7.2 Preparing Intel Key Protection Technology "KPT" For Next-Gen QAT

⚙️Systems Programming

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

🖥️Systems ML

androidauthority.com·

Pruned YOLOv8 ONNX INT8 Fails: 3 Fixes That Work

⚡ML Inference Blog Discussion

Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good

⚡ML Inference Blog

towardsai.net·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

⚡ML Inference News Blog

blog.google··Hacker News

LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization

🔄MLOps Academic

Unsloth Gemma 4 QAT

🎮GPU Programming

Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive llama.cpp conversions suffer accuracy loss

⚡ML Inference News

google/gemma-4-12B-it-qat-q4_0-gguf

⚡ML Inference

huggingface.co·

Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression

🖥️Systems ML Academic

local llm on laptop 780M GPU using llama + gemma 4 qat

🔄MLOps Blog

alper.bearblog.dev·

Google releases Gemma 4 QAT models for local AI on enterprise laptops

⚡ML Inference

Quality Is Not a Safety Proxy Under Quantization

🖥️Systems ML Academic

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

⚡ML Inference Code

github.com··Hacker News

Optimal Post-Training Quantization Scales and Where to Find Them

🕸️Neural Networks Academic

Less-relevant results

FMU Celebrates Distinguished Alumni | Francis Marion University

🔗Distributed Training Academic

Florian Brand, Prime Intellect research engineer, adopts Gemma 4 E4B 6-bit quantized as his primary local Mac LLM

⚡ML Inference News

digg.com··Hacker News

Log in to enable infinite scrolling