⚡ ML Inference - bugrakadirhan · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖Machine Learning Code

github.com··Hacker News

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

🤖Machine Learning News

newsletter.semianalysis.com

··Hacker News

AI Serving Platform That Adapts to Your Model

🔄MLOps Blog

databricks.com·

Breaking the Ice: Analyzing Cold Start Latency in vLLM

🔧MLIR Academic

arxiv.org··Hacker News

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

zozo123.github.io··Hacker News

Running LLM Inference on Kubernetes: What It Actually Takes

🔗Distributed Training Blog

fairwinds.com·

The Inference Alpha: Maximizing Frontier Models on AMD

📐Model Architecture Blog

digitalocean.com·

Infrastructure Options for Scalable AI Inference

⚙️Systems Programming Blog

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

🎮GPU Programming Blog

blogs.nvidia.com·

OpenCV Introduces New DNN Inference Engine

🤖Machine Learning

i-programmer.info·

Pruned YOLOv8 ONNX INT8 Fails: 3 Fixes That Work

🗜️Quantization Blog Discussion

Mobile AI Compute Engine (MACE) inference framework — Vision SDK

🛠️ML Frameworks Blog

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

🧠Deep Learning Academic

Magenta RealTime 2: Open and Local Live Music Models

🧠Deep Learning

magenta.withgoogle.com··Hacker News, Hacker News, r/LocalLLaMA

OpenCV 5.0 Computer Vision Library Released with Rewritten DNN Engine

🤖Machine Learning

DiffusionGemma: 4x Faster Text Generation

🧠Deep Learning News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

Token4Token — pay-per-token inference on Gnosis + Swarm

t4t.eth.link··Hacker News

MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better

🖥️Systems ML News Blog

kaitchup.substack.com··r/LocalLLaMA

From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure

🎮GPU Programming Blog

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

🔗Distributed Training Blog

dnhkng.github.io·

Log in to enable infinite scrolling