🧠 LLMs - niss36 · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

⚙️Compilers Code

github.com··Hacker News

lightmetal: GPU LLM Inference From a Single Java 25 JAR

⚙️Compilers Blog

adambien.blog·

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

⚙️Compilers Academic

LLM Routing: From Strategy Selection to Production Architecture

⚙️Compilers Blog

Initial impressions of Claude Fable 5

🕸️WebAssembly

simonwillison.net··Hacker News

Using Scikit-LLM with Open-Source LLMs

🛠️Developer Tools

machinelearningmastery.com·

Slack bot for the whole team, not per-seat

🤖Coding Agents Discussion

plugand.ai··Hacker News

Report: GKE Inference Gateway delivers up to 92% faster AI responses

🤖Coding Agents Blog

cloud.google.com··Hacker News

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

zozo123.github.io··Hacker News

How LLMs work | Practical Leaders

practical-leaders.com··Hacker News

Why LLMs (still) lack taste

🤖Coding Agents

beyondtheprior.com··Hacker News

The Inference Alpha: Maximizing Frontier Models on AMD

λType Systems Blog

digitalocean.com·

What's in the Box? A Field Guide to AI Models

λType Systems Blog

iankduncan.com·

A system programmer’s guide to LLM inference

⚙️Compilers Blog

blog.xiangpeng.systems··Hacker News

A Plea to the Labs: Let the Models Diagnose.

λType Systems Blog

tangent.bearblog.dev··Hacker News

Qwen 3.6 27B AutoRound GGUF, need your feedback

⚙️Compilers

huggingface.co··r/LocalLLaMA

NVIDIA Accelerates Google DeepMind’s DiffusionGemma for Local AI

🕹️Game Dev Blog

blogs.nvidia.com·

Using local LLMs for agentic coding

🛠️Developer Tools Blog

blog.alexewerlof.com·

You don't need Copilot for code completion, try this instead

🛠️Developer Tools

mistral.ai··r/GithubCopilot

Transitioning from Azure Language Features to Foundry Models

🤖Coding Agents

techcommunity.microsoft.com

·

Log in to enable infinite scrolling