Burn
Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change
💻Local LLMs Content type: News Content type: BlogTensorBench: Benchmarking Coding Agents on a Compiler-Based Tensor Framework
🔥PyTorch Content type: Academiclbj96347/nemotron-3.5-asr-ios: On-device, offline speech recognition for iPhone/iPad using NVIDIA's Nemotron-3.5-ASR Streaming 0.6B (multilingual) via CoreML.SwiftUI app with mic capture + audio file import, RNN-Tdecoding, and live benchmark metrics (latency, RTF, memory).
🎙️Whisper Content type: CodeGemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
💻Local LLMs Content type: News Content type: BlogNo more posts from nmarshall's subscribed feeds.