Quantization

Model Compression, Neural Networks, Precision Reduction, Efficient Inference

Feeds to Scour
SubscribedAll
Scoured 39 posts in 10.8 ms

Pruned YOLOv8 ONNX INT8 Fails: 3 Fixes That Work

馃捇Local LLMsContent type: BlogContent type: Discussion
tildalice.io

Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression

馃捇Local LLMsContent type: Academic
arxiv.org

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

馃捇Local LLMsContent type: NewsContent type: Blog
blog.googleHacker News

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

馃捇Local LLMs
androidauthority.com

Unsloth Gemma 4 QAT

馃捇Local LLMs
unsloth.ai

Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive llama.cpp conversions suffer accuracy loss

馃捇Local LLMsContent type: News
digg.com

Google releases Gemma 4 QAT models for local AI on enterprise laptops

馃捇Local LLMs
4sysops.com

UniSVQ: 2-bit Unified Scalar-Vector Quantization

馃搳Vector QuantizationContent type: Academic
arxiv.org

Trainable Smooth-Rotation Transforms with Learned Channel Scales for LLM Quantization

馃捇Local LLMsContent type: Academic
arxiv.org

Understanding Quantization-Aware Training: Gradients at Quantized Weights Bias to the Low-Loss Basin

馃捇Local LLMsContent type: Academic
arxiv.org

LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization

馃搳Vector QuantizationContent type: Academic
arxiv.org

Optimal Post-Training Quantization Scales and Where to Find Them

馃捇Local LLMsContent type: Academic
arxiv.org

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

馃捇Local LLMsContent type: Academic
arxiv.org

ScaleSweep: Accurate NVFP4 Post-Training Quantization of LLMs via Block Scale Initialization

馃捇Local LLMsContent type: Academic
arxiv.org

On Low-Bit Quantization Errors in Speaker Verification: Diagnostic and Mitigation

馃捇Local LLMsContent type: Academic
arxiv.org

Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

馃捇Local LLMsContent type: Academic
arxiv.org

STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models

馃捇Local LLMsContent type: Academic
arxiv.org

LLMCodec: Adapting Video Codecs for Efficient Weight Compression of Large Language Models

馃捇Local LLMsContent type: Academic
arxiv.org

QuBLAST: A Framework for Quantizing Large Language Models with Block-Level Compression Approach and Activation Scaling Strategy

馃捇Local LLMsContent type: Academic
arxiv.org

MorphoQuant: Modality-Aware Quantization for Omni-modal Large Language Models

馃捇Local LLMsContent type: Academic
arxiv.org

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help