Quantization

Feeds to Scour
SubscribedAll
Scoured 103 posts in 5.8 ms

Quality Is Not a Safety Proxy Under Quantization

 🔐Cryptography  Content type: Academic
arxiv.org·

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

 💬LLMs

Google releases Gemma 4 QAT models for local AI on enterprise laptops

 Hardware Acceleration
4sysops.com·

fix(memory): move local llama.cpp runtime to provider plugin · openclaw/openclaw@3137110

 💬LLMs  Content type: Code
github.com·

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

 💬LLMs  Content type: Blog
ziraph.com··Hacker News

UniSVQ: 2-bit Unified Scalar-Vector Quantization

 📊Vector Quantization  Content type: Academic
arxiv.org·

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

 ✍️Prompt Engineering

Gemma 4 12B: A unified, encoder-free multimodal model

 💬LLMs  Content type: Discussion

A system programmer’s guide to LLM inference

 🔤Tokenization  Content type: Blog

Ideogram4 GGUF is out!

 🎨Generative AI

Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression

 💬LLMs  Content type: Academic
arxiv.org·

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

 💬LLMs  Content type: Blog
dnhkng.github.io·

BeeLlama.cpp DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster

 🎮Game Engines
sleepingrobots.com·

Trainable Smooth-Rotation Transforms with Learned Channel Scales for LLM Quantization

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

 Speculative Decoding  Content type: Blog

LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization

 📊Vector Quantization  Content type: Academic
arxiv.org·

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

 🎛️Fine-tuning  Content type: Academic
arxiv.org·

stable-diffusion.cpp/docs/quantization_and_gguf.md at master · leejet/stable-diffusion.cpp

 🤖AI  Content type: Code

On Low-Bit Quantization Errors in Speaker Verification: Diagnostic and Mitigation

 📊Vector Quantization  Content type: Academic
arxiv.org·

google/gemma-4-12B-it-qat-q4_0-gguf

 🤖AI
huggingface.co·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help