🧠 LLMs - niss36 · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

⚙️Compilers Code

github.com··Hacker News

A system programmer’s guide to LLM inference

⚙️Compilers Blog

blog.xiangpeng.systems··Hacker News

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

⚙️Compilers Academic

Initial impressions of Claude Fable 5

🕸️WebAssembly

simonwillison.net··Hacker News

lightmetal: GPU LLM Inference From a Single Java 25 JAR

⚙️Compilers Blog

adambien.blog·

Using Scikit-LLM with Open-Source LLMs

🛠️Developer Tools

machinelearningmastery.com·

LLM Routing: From Strategy Selection to Production Architecture

⚙️Compilers Blog

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

⚙️Compilers Blog

dnhkng.github.io·

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

🕹️Game Dev Blog

blogs.nvidia.com·

Slack bot for the whole team, not per-seat

🤖Coding Agents Discussion

plugand.ai··Hacker News

How LLMs work | Practical Leaders

practical-leaders.com··Hacker News

Report: GKE Inference Gateway delivers up to 92% faster AI responses

🤖Coding Agents Blog

cloud.google.com··Hacker News

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

zozo123.github.io··Hacker News

Why LLMs (still) lack taste

🤖Coding Agents

beyondtheprior.com··Hacker News

Transitioning from Azure Language Features to Foundry Models

🤖Coding Agents

techcommunity.microsoft.com

·

The Inference Alpha: Maximizing Frontier Models on AMD

λType Systems Blog

digitalocean.com·

Using local LLMs for agentic coding

🛠️Developer Tools Blog

blog.alexewerlof.com·

What's in the Box? A Field Guide to AI Models

λType Systems Blog

iankduncan.com·

The Bill Arrives: How to Manage Agentic AI Costs at Scale

🤖Coding Agents Blog

cockroachlabs.com·

DiffusionGemma: 4x Faster Text Generation

🕹️Game Dev News Blog

blog.google··Hacker News, r/LocalLLaMA, r/singularity

Log in to enable infinite scrolling