LLMs

Feeds to Scour
SubscribedAll
Scoured 951 posts in 5.3 ms

harshuljain13/llm-inference-at-scale: A Practitioner handbook for production llm serving.

 🤖AI Engineering  Content type: Code
github.com··Hacker News, r/LLM

Intelligent inference scheduling with llm-d on Red Hat AI

 📐System Design

General-purpose large language models outperform specialized clinical AI tools on medical benchmarks

 🤖AI Engineering  Content type: Academic
nature.com··Hacker News

Multi-Bitwidth Quantization for LLMs Using Additive Codebooks

 🔍RAG  Content type: Academic
arxiv.org·

147th airhacks tv: Local LLMs, LightMetal, ZSmith Agents, AI Rails, Saving Tokens

 🖥️Backend Development  Content type: Blog
adambien.blog·

Implications of Continual Learning for LLM Agents: Introduction

 🛡️AI Safety
lesswrong.com·

A system programmer’s guide to LLM inference

 🔍RAG  Content type: Blog

How to Run an LLM Locally: Ultimate Guide to Local AI 2026

 🤖AI Engineering  Content type: Blog
cswithsanjay.blogspot.com·

Introducing LLM as a Judge: Scaling search relevance evaluation with AI

 🔍RAG  Content type: Blog
opensearch.org·

Unsloth Minimax M3 GGUF

 🤖AI Engineering

What Are Tokens in LLMs?

 🔍RAG  Content type: Blog

Tokenization Consulting in the USA: The Ultimate Guide to RWA Compliance

 🖥️Backend Development
ziuma.com·

Lowest-Cost LLM Inference: The Complete OpenRouter Guide

 📐System Design  Content type: Blog  Content type: Discussion  Content type: Tutorial
openrouter.ai·

Making a Vintage LLM from Scratch

 🤖AI Engineering
crlf.link··Hacker News

2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP

 🤖AI Engineering  Content type: Blog
dnhkng.github.io·

WhatLLM.org: Compare LLMs by Benchmarks, Price & Speed

 🤝AI Agents  Content type: Discussion  Content type: Reference
whatllm.org·

Inferoa AI harness claimed 90% cache savings. We ran it and measured 97.8%

 🤖AI Engineering

I ran local LLMs on my phone for a month, and now my desktop setup feels like overkill

 🤖AI Engineering
xda-developers.com·

Context windows in AI: why every token is a budget decision

 🖥️Backend Development  Content type: Blog
redis.io·

local llm on laptop 780M GPU using llama + gemma 4 qat

 🛡️AI Safety  Content type: Blog
alper.bearblog.dev·

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help