🏗️ AI Infrastructure - ndjenks · Scour

Breaking the Ice: Analyzing Cold Start Latency in vLLM

🫀Microkernels Academic

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

⌨️CLI Tools

zozo123.github.io··Hacker News

Nvidia DGX Spark GB10 – AI Models and Guide with vLLM and Autonomous Script

⚡Concurrency Code

github.com··Hacker News

Nissan Slams The Door On A Nismo Navara, Mitsubishi Cracks One Open For Ralliart

carscoops.com·

Distributed multi-agent systems with Aspire and Microsoft Agent Framework

🌐Distributed Systems Blog

devblogs.microsoft.com·

Pruned YOLOv8 ONNX INT8 Fails: 3 Fixes That Work

⚡Dynamic Recompilation Blog Discussion

DeepSeekV4 1.6T Day 0 to Day 43 Performance Over Time - Huawei, GB300 NVL72, MI355X, B200

🗄️CUDA Memory News

newsletter.semianalysis.com

··Hacker News

No Token Left Behind: Demystifying Token-in-Token-Out in Miles

📡Protocol Design Blog

lmsys.org··Hacker News

LLM Inference Engineering Room — Part 3: The Orchestration Layer

📋Formal Methods Blog

vimal-dwarampudi.medium.com·

OpenCV Introduces New DNN Inference Engine

i-programmer.info·

google/gemma-4-31B-it · fix: chat template — null handling, reasoning preservation, turn-tag balance, input validation

huggingface.co··r/LocalLLaMA

Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency

🎮GPU Microarchitecture News Blog

blog.google··Hacker News

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

⚡Concurrency Blog

dnhkng.github.io·

LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization

🔧Compilers Academic

New comment by bhvk08 in "Ask HN: Who wants to be hired? (June 2026)"

⚙️Low-Level Programming Discussion

news.ycombinator.com··Hacker News

From GPU to Token: The 8-Layer Observability Stack for AI Infrastructure

🫀Microkernels Blog

Embedding pipelines are the new ETL

🗄️Vector Databases Blog

infoworld.com·

OpenCV 5 release - New DNN engine with enhanced ONNX and LLM/VLM support, Intel, Arm, and RISC-V hardware optimizations - CNX Software

🕹️Emulators News

cnx-software.com·

OpenCV 5.0 Computer Vision Library Released with Rewritten DNN Engine

🗄️Vector Databases

Speculators v0.5.0: DFlash support and online training

🗄️Backend Dev

developers.redhat.com·

Log in to enable infinite scrolling