Quantization
Pruned YOLOv8 ONNX INT8 Fails: 3 Fixes That Work
⚡ML Inference Content type: Blog Content type: DiscussionGemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
⚡ML Inference Content type: News Content type: BlogLC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization
🔄MLOps Content type: AcademicJoint Structural Pruning and Mixed-Precision Quantization for LLM Compression
🖥️Systems ML Content type: AcademicOptimal Post-Training Quantization Scales and Where to Find Them
🕸️Neural Networks Content type: AcademicLess-relevant results