⚡ Quantization - jhcha.oyo · Scour

UniSVQ: 2-bit Unified Scalar-Vector Quantization

📊Vector Quantization Academic

[AINews] not much happened today

📉Technical Analysis News

·

mtmd : add video input support by ngxson · Pull Request #24269 · ggml-org/llama.cpp

🎮Godot Code

github.com··r/LocalLLaMA

Trainable Smooth-Rotation Transforms with Learned Channel Scales for LLM Quantization

🎛️Fine-tuning Academic

The Edge LLM Offload Story

semiengineering.com·

LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization

📊Vector Quantization Academic

Show HN: Ext-Infer

infer.displace.tech··Hacker News

Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression

💬LLMs Academic

mtp: support for gemma-4 E2B and E4B assistants by max-krasnyansky · Pull Request #24282 · ggml-org/llama.cpp

💬LLMs Code

github.com··r/LocalLLaMA

Google Gemma 4 12B: Architecture, Benchmarks, Access, and Hands-on Guide for Developers

💬LLMs Blog

analyticsvidhya.com·

FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models

🎛️Fine-tuning Academic

Less-relevant results

[AINews] Reve 2 and Ideogram 4: Layouts in Imagegen

🎮Reinforcement Learning

·

On Low-Bit Quantization Errors in Speaker Verification: Diagnostic and Mitigation

📊Vector Quantization Academic

defai-digital/ax-engine: Apple Silicon LLM runtime supporting Gemma 4 and Qwen 3.6 MTP modes

🤖AI Code

github.com··Hacker News

not much happened today | AINews

ScaleSweep: Accurate NVFP4 Post-Training Quantization of LLMs via Block Scale Initialization

💬LLMs Academic

bigattichouse/packed-twin-inference: PTI achieves ~2× throughput using a single quantized model (Q5_K_M or better) by running 4 generation streams in one batched decode call. The GPU loads model weights once per step and produces 4 predictions simultaneously. KV cache overhead is ~0.8 GiB total for all 4 streams. No draft model. No quality loss

💬LLMs Code

github.com··r/LocalLLaMA

Sigma-Branch: Hierarchical Single-Path Network Reconstruction for Dynamic Inference with Reduced Active Parameters

🤖AI Academic

FADA: Accessible fetal ultrasound interpretation and annotation with a selectively distilled unified vision-language model

🧠Deep Learning Academic

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖AI Code

github.com··Hacker News

Log in to enable infinite scrolling