🔢 FP8 Training - nayyara.airlangga · Scour

Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell

💰Inference Cost News Blog

developer.nvidia.com·

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

💰Inference Cost Academic

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

⏱️Prefill Decoding Code

github.com··Hacker News

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

🧠Inference Engineering Blog

dnhkng.github.io·

Less-relevant results

Youssof Altoukhi (@Youssofal_)

🧠Inference Engineering

xcancel.com··r/LocalLLaMA

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

💰Inference Cost Blog

mimo.xiaomi.com··Hacker News, r/LocalLLaMA

A system programmer’s guide to LLM inference

💰Inference Cost Blog

blog.xiangpeng.systems··Hacker News

DeepSeek V4, LeCun's Bet Against LLMs, and Lovable's Self-Improving Agent - The Tokenizer Edition #30

🧠Inference Engineering

newsletter.artofsaience.com·

libertywing/FlashMemory-Deepseek-V4: FlashMemory DS-V4 Retriever: a lightweight retriever that sparsifies DeepSeek-V4 CSA KV-cache. Weights available on Hugging Face.

🧠Inference Engineering Code

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

🧠Inference Engineering News

newsletter.semianalysis.com

··Hacker News

Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design

💻Systems Programming Academic

Apple rebuilt its on-device AI stack at WWDC 2026

🔢GEMM Optimization Blog

ziraph.com··Hacker News

3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1

💰Inference Cost Blog

databricks.com·

The economics of speculative decoding

🚀Speculative Decoding Blog

fergusfinn.com··Hacker News

{ "id": "247ea069-731d-4b79-9d64-8807463de95c", "revision": 0, "last_no

📡OpenTelemetry

pastebin.com··r/StableDiffusion

not much happened today | AINews

🧠Inference Engineering

An 84-Format Numeric Catalog with Bit-Exact Conformance Vectors: A Vendor-Neutral Reference for FP8, BF16, MXFP4, and Microscaling Formats

🪄Chiplet Design Academic

Speculators v0.5.0: DFlash support and online training

🚀Speculative Decoding

developers.redhat.com·

"North Mini Code"; open weights, 30B param, Canadian coding model

⏱️Prefill Decoding Blog

cohere.com··Hacker News

Gigabyte AI Top 500: Local 600B Parameter LLM Desktop Training Hardware

🎮GPU Computing

armdevices.net·

Log in to enable infinite scrolling