📉 Model Quantization - miterion · Scour

baidu-baige/LoongForge: A modular, scalable, high-performance training framework for LLMs, VLMs, diffusion, and embodied models. 🏎️TensorRT

github.com·4h·Hacker News

AMD Confirms FSR 4.1 Support for Radeon RX 7000 in July, RX 6000 GPUs Get it in 2027 🔍Nsight

gizchina.com·6d

A cheap fix that saves the AI $400M dollars a year and brings 4B people online 🔄ONNX

codecai.net·4d·Hacker News

CAR-SAM: Cross-Attention Reconstruction for Post-Training Quantization of the Segment Anything Model 🧩Attention Kernels

Whisper.cpp vs Faster-Whisper: Why Speed Tests Lie 📊Profiling Tools

tildalice.io·4d

AMD surprises RDNA 3 and RDNA 2 owners with FSR 4.1 support, arriving in July for RX 7000 series 🔍Nsight

tweaktown.com·6d

AMD FSR4 · Issue #2 · Korthos-Software/low_latency_layer 📈GPU Occupancy

github.com·2d·r/linux_gaming

AsymFlow Claims More Realistic AI Images by Moving Beyond Latent Diffusion 🏎️TensorRT

firethering.com·4d·Hacker News

Rockchip unveils RK3572 processor with 4 TOPS NPU and LPDDR5X support 🧠CPU Architecture

linuxgizmos.com·3d

Nonlinear Bipolar Compensation: Handling Outliers in Post-Training Quantization 📊Gradient Accumulation

AMD makes FSR 4 upscaling official for Radeon RX 7000- and 6000-series cards — RDNA 3 and RDNA 2 chips will soon enjoy improved visuals 🎮NVIDIA

tomshardware.com

·6d

AI boom is pricing out PC builders and bootstrapped AI startups 🤖AI Coding Tools

startupfortune.com·4d

AMD's FSR 4 coming to RDNA 2 could give the Xbox Series X a PS5 Pro-like upgrade 🔧PTX

tweaktown.com·6d

MARR: Module-Adaptive Residual Reconstruction for Low-Bit Post-Training Quantization 🏎️TensorRT

TriAxialKV: Toward Extreme Low-Precision KV-Cache Quantization for Agentic Inference Tasks 🎯Tensor Cores

KV Cache Optimization: 3x Faster LLM Inference on 24GB VRAM 🎛️CUDA Optimization

tildalice.io·6d

PyTorch, rewritten from scratch in pure Rust 📜TorchScript

github.com·6d·Hacker News

Runtime-Certified Bounded-Error Quantized Attention 👁️Attention Optimization

Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models 🏎️TensorRT

LoopQ: Quantization for Recursive Transformers 🚀Compiler Optimization

Log in to enable infinite scrolling