📊 Quantization - matmat · Scour

Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression

💻Local LLMs Academic

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

💻Local LLMs News Blog

blog.google··Hacker News

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

💻Local LLMs Blog

mimo.xiaomi.com··Hacker News, r/LocalLLaMA

Unsloth Gemma 4 QAT

Apple rebuilt its on-device AI stack at WWDC 2026

🧪Data science Blog

ziraph.com··Hacker News

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

vettedconsumer.com··Hacker News

LLM Research Papers: The 2026 List (January to May)

💻Local LLMs News

magazine.sebastianraschka.com

··Hacker News

Understanding Quantization-Aware Training: Gradients at Quantized Weights Bias to the Low-Loss Basin

💻Local LLMs Academic

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

💻Local LLMs Academic

Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

💻Local LLMs Academic

On Low-Bit Quantization Errors in Speaker Verification: Diagnostic and Mitigation

💻Local LLMs Academic

STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models

💻Local LLMs Academic

LLMCodec: Adapting Video Codecs for Efficient Weight Compression of Large Language Models

💻Local LLMs Academic

QuBLAST: A Framework for Quantizing Large Language Models with Block-Level Compression Approach and Activation Scaling Strategy

💻Local LLMs Academic

SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails

💻Local LLMs Academic

MorphoQuant: Modality-Aware Quantization for Omni-modal Large Language Models

💻Local LLMs Academic

Knowledge Distillation for Visual Autoregressive Models

📐Projective Geometry Academic

Value-and-Structure Alignment for Routing-Consistent Quantization of Mixture-of-Experts Models

💻Local LLMs Academic

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

🎙️Whisper Academic

SecRL-Prune: Structured Reinforcement Learning-Based Pruning of CodeLLMs for Preserving Adversarial Code Mutation

📡Information theory Academic

No more posts from matmat's subscribed feeds.

Scour all 25257 feeds Learn more about Feeds

Log in to enable infinite scrolling