🧠 LLMs - kevincrane · Scour

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

🤖AI Engineering Code

github.com··Hacker News, r/LLM

Intelligent inference scheduling with llm-d on Red Hat AI

📐System Design

developers.redhat.com·

General-purpose large language models outperform specialized clinical AI tools on medical benchmarks

🤖AI Engineering Academic

nature.com··Hacker News

Multi-Bitwidth Quantization for LLMs Using Additive Codebooks

🔍RAG Academic

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

🖥️Backend Development Blog

adambien.blog·

Implications of Continual Learning for LLM Agents: Introduction

🛡️AI Safety

lesswrong.com·

A system programmer’s guide to LLM inference

🔍RAG Blog

blog.xiangpeng.systems··Hacker News

How to Run an LLM Locally: Ultimate Guide to Local AI 2026

🤖AI Engineering Blog

cswithsanjay.blogspot.com·

Introducing LLM as a Judge: Scaling search relevance evaluation with AI

🔍RAG Blog

opensearch.org·

Unsloth Minimax M3 GGUF

🤖AI Engineering

huggingface.co··r/LocalLLaMA

What Are Tokens in LLMs?

🔍RAG Blog

bearisland.dev··Hacker News

Tokenization Consulting in the USA: The Ultimate Guide to RWA Compliance

🖥️Backend Development

Lowest-Cost LLM Inference: The Complete OpenRouter Guide

📐System Design Blog Discussion Tutorial

openrouter.ai·

Making a Vintage LLM from Scratch

🤖AI Engineering

crlf.link··Hacker News

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

🤖AI Engineering Blog

dnhkng.github.io·

WhatLLM.org: Compare LLMs by Benchmarks, Price & Speed

🤝AI Agents Discussion Reference

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

🤖AI Engineering

zozo123.github.io··Hacker News

I ran local LLMs on my phone for a month, and now my desktop setup feels like overkill

🤖AI Engineering

xda-developers.com·

Context windows in AI: why every token is a budget decision

🖥️Backend Development Blog

local llm on laptop 780M GPU using llama + gemma 4 qat

🛡️AI Safety Blog

alper.bearblog.dev·

Log in to enable infinite scrolling