⚡ Quantization - jhcha.oyo · Scour

Quality Is Not a Safety Proxy Under Quantization

🔐Cryptography Academic

Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM

deemwar-products.github.io··Hacker News

Google releases Gemma 4 QAT models for local AI on enterprise laptops

⚡Hardware Acceleration

fix(memory): move local llama.cpp runtime to provider plugin · openclaw/openclaw@3137110

💬LLMs Code

Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB

💬LLMs Blog

ziraph.com··Hacker News

UniSVQ: 2-bit Unified Scalar-Vector Quantization

📊Vector Quantization Academic

Running Qwen 35B MoE at 450k Context on a Single 32GB GPU

✍️Prompt Engineering

local-llm.utop.workers.dev··Hacker News

Gemma 4 12B: A unified, encoder-free multimodal model

💬LLMs Discussion

news.ycombinator.com··Hacker News

A system programmer’s guide to LLM inference

🔤Tokenization Blog

blog.xiangpeng.systems··Hacker News

Ideogram4 GGUF is out!

🎨Generative AI

huggingface.co··r/StableDiffusion

Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression

💬LLMs Academic

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

💬LLMs Blog

dnhkng.github.io·

BeeLlama.cpp DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster

🎮Game Engines

sleepingrobots.com·

Trainable Smooth-Rotation Transforms with Learned Channel Scales for LLM Quantization

🎛️Fine-tuning Academic

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

⚡Speculative Decoding Blog

mimo.xiaomi.com··Hacker News, r/LocalLLaMA

LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization

📊Vector Quantization Academic

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

🎛️Fine-tuning Academic

stable-diffusion.cpp/docs/quantization_and_gguf.md at master · leejet/stable-diffusion.cpp

🤖AI Code

github.com··r/StableDiffusion

On Low-Bit Quantization Errors in Speaker Verification: Diagnostic and Mitigation

📊Vector Quantization Academic

google/gemma-4-12B-it-qat-q4_0-gguf

huggingface.co·

Log in to enable infinite scrolling