Quantization

Model Compression, Neural Networks, Precision Reduction, Efficient Inference

Feeds to Scour
SubscribedAll
Scoured 23 posts in 46.5 ms

Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression

 💻Local LLMs  Content type: Academic
arxiv.org·

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

 💻Local LLMs  Content type: News  Content type: Blog
blog.google··Hacker News

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

 💻Local LLMs  Content type: Blog

Unsloth Gemma 4 QAT

 💻Local LLMs
unsloth.ai·

Apple rebuilt its on-device AI stack at WWDC 2026

 🧪Data science  Content type: Blog
ziraph.com··Hacker News

GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)

 💻Local LLMs

LLM Research Papers: The 2026 List (January to May)

 💻Local LLMs  Content type: News

Understanding Quantization-Aware Training: Gradients at Quantized Weights Bias to the Low-Loss Basin

 💻Local LLMs  Content type: Academic
arxiv.org·

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

 💻Local LLMs  Content type: Academic
arxiv.org·

Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models

 💻Local LLMs  Content type: Academic
arxiv.org·

On Low-Bit Quantization Errors in Speaker Verification: Diagnostic and Mitigation

 💻Local LLMs  Content type: Academic
arxiv.org·

STaR-Quant: State-Time Consistent Post-Training Quantization for Diffusion Large Language Models

 💻Local LLMs  Content type: Academic
arxiv.org·

LLMCodec: Adapting Video Codecs for Efficient Weight Compression of Large Language Models

 💻Local LLMs  Content type: Academic
arxiv.org·

QuBLAST: A Framework for Quantizing Large Language Models with Block-Level Compression Approach and Activation Scaling Strategy

 💻Local LLMs  Content type: Academic
arxiv.org·

SEAM: Shortcut-Aware Real-Time Detection of Scripted vs. Spontaneous Speech for Interview Guardrails

 💻Local LLMs  Content type: Academic
arxiv.org·

MorphoQuant: Modality-Aware Quantization for Omni-modal Large Language Models

 💻Local LLMs  Content type: Academic
arxiv.org·

Knowledge Distillation for Visual Autoregressive Models

 📐Projective Geometry  Content type: Academic
arxiv.org·

Value-and-Structure Alignment for Routing-Consistent Quantization of Mixture-of-Experts Models

 💻Local LLMs  Content type: Academic
arxiv.org·

TempoVLA: Learning Speed-Controllable Vision-Language-Action Policies

 🎙️Whisper  Content type: Academic
arxiv.org·

SecRL-Prune: Structured Reinforcement Learning-Based Pruning of CodeLLMs for Preserving Adversarial Code Mutation

 📡Information theory  Content type: Academic
arxiv.org·

No more posts from matmat's subscribed feeds.

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help