⚡ Quantization - buckman · Scour

Chapter 5: Linear Transformation and Softmax 📐Linear Algebra

dev.to·2d·DEV

DeepSeek v4 🚀Performance

news.smol.ai·3d

The other paper that killed deep learning theory 📊ML Research

lesswrong.com·2h

Types and Neural Networks 💻Local LLMs

brunogavranovic.com·6d·Hacker News

Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model ⚡Inference

simonwillison.net·4d

TurboQuant: A First-Principles Walkthrough 🔢NumPy

arkaung.github.io·7h·Hacker News

The Last Pivot: Why Quality Gates Killed My Final KV-Cache Speedup 🚀Performance

dev.to·4h·DEV

10GB VRAM Local LLM: The Complete Setup Guide (2026) 🟩Nvidia

sitepoint.com·4d

I Tried to Run VGG19 on a CPU… It Failed. So I Fixed It." 🤖LLM Inference

github.com·5d·DEV

Feature Extraction + Head 👁️Computer Vision

·2d

The Evolution of Nvidia Blackwell GPU Memory Architecture ⚡Hardware Acceleration

freecodecamp.org·5d

RTX 4090 Cooling, LLM KV Cache Quantization, & Deepseek V4 Flash Models 🟩Nvidia

dev.to·2d·DEV

AsishKumarDalal/memoryllm: using differntiable neural computer architecture with GPT2 to provide memory ⚡Inference

github.com·2d·DEV

SGLang CVE-2026-5760 (CVSS 9.8) Enables RCE via Malicious GGUF Model Files 🛡️Parser Security

thehackernews.com·6d

Anker's 'Thus' chip brings AI to its headphones and other products 🔌Neurotech

engadget.com·4d

Research Log: Monet/PEER sparse experts 📊ML Research

lesswrong.com·4d

Deepseek v4 Flash, Gemma/Qwen KV Cache Quantization & 384K Context ⚡Inference

dev.to·2d·DEV

Stop Paying the Abstraction Tax : How I Built a C-Engine 12x Faster than Pandas 📊Columnar Databases

github.com·3h·DEV

I Built a Glossary of LLM Terms That Actually Explains What They Change in Production 🧠LLM Tooling

dev.to·2d·DEV

Training a Transformer to Compose One Step Per Layer (and Proving It) 🤖Large Language Models

lesswrong.com·9h

Log in to enable infinite scrolling