📉 Model Quantization - miterion · Scour

needle/docs/simple_attention_networks.md at main 👁️Attention Optimization

AMD Is Bringing Improved FSR 4 Upscaling To Its Older GPUs 🔧PTX

hardware.slashdot.org·5d

Quantized Machine Learning Models for Medical Imaging in Low-Resource Healthcare Settings 🏎️TensorRT

Why Gemma-4 26B MoE works in HuggingFace but breaks in prod inference engines 🔄ONNX

github.com·5d·Hacker News

FTerViT: Fully Ternary Vision Transformer 👁️Attention Optimization

Theory-optimal Quantization Based on Flatness 🏎️TensorRT

xxxn3m3s1sxxx/ATLAS-TQ1_0: TQ1.0 ternary inference engine for BitNet b1.58 on CPU. Pack + run Falcon3-1B/3B/7B/10B, no GPU needed. ✂️CUTLASS

github.com·3d·Hacker News

MegaTrain Full Precision Training of 100B+ Parameter LLMs on a Single GPU 🏎️TensorRT

github.com·4d·Hacker News

K-Quantization and its Impact on Output Performance 🏎️TensorRT

GAMMA: Global Bit Allocation for Mixed-Precision Models under Arbitrary Budgets 🏎️TensorRT

SAFE-SVD: Sensitivity-Aware Fidelity-Enforcing SVD for Physics Foundation Models 🏎️TensorRT

TORQ: Two-Level Orthogonal Rotation for MXFP4 Quantization 🎯Tensor Cores

StatQAT: Statistical Quantizer Optimization for Deep Networks 🏎️TensorRT

Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis 📊Gradient Accumulation

Cross-Paradigm Knowledge Distillation: A Comprehensive Study of Bidirectional Transfer Between Random Forests and Deep Neural Networks for Big Data Applications 🎓Model Distillation

Robust Basis Spline Decoupling for the Compression of Transformer Models 🎓Model Distillation

A Hardware-Aware, Per-Layer Methodology for Post-Training Quantization of Large Language Models 📊Gradient Accumulation

Multi-Scale Dequant: Eliminating Dequantization Bottleneck via Activation Decomposition for Efficient LLM Inference 🎯Tensor Cores

Not All Tasks Quantize Equally: Fisher-Guided Quantization for Visual Geometry Transformer 🏎️TensorRT

Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution 📊Gradient Accumulation

Sign up or log in to see more results

Log in to enable infinite scrolling