📉 Model Quantization - miterion · Scour

DiRotQ: Rotation-Aware Quantization for 4-bit Diffusion Transformers 🏎️TensorRT

GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU 📈GPU Occupancy

theahmadosman.substack.com·19h·Substack, r/LocalLLaMA

Qwen 27b MTP Config, Llama.cpp Single 3090 📊Profiling Tools

github.com·4d·r/LocalLLaMA

Why Shrinking an AI Model Often Makes It More Useful ⚡ONNX Runtime

siliconopera.com·1d

reComputer RK3576/RK3588 Edge AI computers are supported by reComputer AI Lab one-click deployment platform ⚡ONNX Runtime

cnx-software.com·3h

The custom AI ASIC state of play (May 2026) — Broadcom deals, Google TPUs, Meta MTIA & beyond 🔧PTX

tomshardware.com

·4h

Need a second pair of eyes, this Qwen3.6 27B quant recipe consistently thinks less and is correct ⚡ONNX Runtime

huggingface.co·6d·r/LocalLLaMA

A compressed sensing neuromorphic processor for sparse signal classification 📊Gradient Accumulation

frontiersin.org·11h

Initial Benchmarks Of The SpacemiT K3 RVA23 RISC-V CPU With The K3 Pico-ITX 📈Occupancy Optimization

phoronix.com·1d·Hacker News

Qwen’s MTP test puts local AI back in startup math ⚡ONNX Runtime

startupfortune.com·6d

Command A+: Making sovereign agentic capabilities available to all 🤖AI Coding Tools

cohere.com·1d·Hacker News

GRIP-VLM: RL for Efficient Vision-Language Models 📊Gradient Accumulation

startuphub.ai·6d

michelangeloromerochisco/ternative: Inference engine for ternary-weight LLMs with runtime LoRA - the llama.cpp of BitNet models 🔄ONNX

github.com·1d·Hacker News

Forlinx rolls out FET3572-C SoM and OK3572-C board with Rockchip RK3572 🧠CPU Architecture

linuxgizmos.com·3d

Quant.npu: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully Static Quantization 🏎️TensorRT

Inside the M4 Apple Neural Engine, Part 2: ANE Benchmarks 🎯Tensor Cores

maderix.substack.com·3d·Substack

TFLite Model Conversion: 10 Commands That Actually Work 🔄ONNX

tildalice.io·3d

AMD promises to bring improved, hardware-backed FSR 4 upscaling to older Radeon GPUs 🎯GPU Kernels

arstechnica.com·6d

E-PMQ: Expert-Guided Post-Merge Quantization with Merged-Weight Anchoring 🏎️TensorRT

Quantization From First Principles: Build Your Own INT8 Inference Engine 🏎️TensorRT

·5d

Log in to enable infinite scrolling