🧠 LLMs - kevincrane · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖AI Engineering Code

github.com··Hacker News, r/LLM

Intelligent inference scheduling with llm-d on Red Hat AI

📐System Design

developers.redhat.com·

General-purpose large language models outperform specialized clinical AI tools on medical benchmarks

🤖AI Engineering Academic

Multi-Bitwidth Quantization for LLMs Using Additive Codebooks

🔍RAG Academic

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

🖥️Backend Development Blog

adambien.blog·

Unsloth Minimax M3 GGUF

🤖AI Engineering

huggingface.co··r/LocalLLaMA

A system programmer’s guide to LLM inference

🔍RAG Blog

blog.xiangpeng.systems··Hacker News

Introducing LLM as a Judge: Scaling search relevance evaluation with AI

🔍RAG Blog

opensearch.org·

Tokenization Consulting in the USA: The Ultimate Guide to RWA Compliance

🖥️Backend Development

How to Run an LLM Locally: Ultimate Guide to Local AI 2026

🤖AI Engineering Blog

cswithsanjay.blogspot.com·

Implications of Continual Learning for LLM Agents: Introduction

🛡️AI Safety

lesswrong.com·

What Are Tokens in LLMs?

🔍RAG Blog

bearisland.dev··Hacker News

Making a Vintage LLM from Scratch

🤖AI Engineering

crlf.link··Hacker News

WhatLLM.org: Compare LLMs by Benchmarks, Price & Speed

🤝AI Agents Discussion Reference

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

🤖AI Engineering

zozo123.github.io··Hacker News

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

🤖AI Engineering Blog

dnhkng.github.io·

I ran local LLMs on my phone for a month, and now my desktop setup feels like overkill

🤖AI Engineering

xda-developers.com·

Context windows in AI: why every token is a budget decision

🖥️Backend Development Blog

How LLMs are Actually Trained

🔍RAG News Blog

blog.algomaster.io·

Friday Five — June 12, 2026

🤖AI Engineering

Log in to enable infinite scrolling