⚡ Inference - ratorcvn · Scour

LLM inference engine from scratch in C++ 🧠LLMs

anirudhsathiya.com·4d·Hacker News

Scheduling the Unschedulable: Taming Black-Box LLM Inference at Scale 🔧MLOps

arxiv.org·1d

The Engine Behind Modern LLM Inference, Part 1: Continuous Batching, PagedAttention, and the End of… 🧠LLMs

medium.com·11h

Inside LLM Inference: KV Cache, Prefill, and the Decode Bottleneck 🧠LLMs

pub.towardsai.net

·23h

RetroInfer: A Vector Storage Engine for Scalable Long-Context LLM Inference 🧠LLMs

vldb.org·1d

Overcoming inference challenges 📊AI Models

redhat.com·3d

milanm/AutoGrad-Engine: A complete GPT language model (training and inference) in ~600 lines of pure C#, zero dependencies 🧠LLMs

github.com·13h·Hacker News

The case for Model-as-a-Service over self-managed inference 🔧MLOps

news.ycombinator.com·2d·Hacker News

Tech links of April 2026 🌐Distributed Systems

codeyarns.com·52m

I Ran My KYB Engine at Three Quantization Levels. Accuracy Didn't Move. Cost Dropped 6x. 🔧MLOps

walsenburgtech.com·11h·Hacker News

KV Cache in LLM Inference: From PagedAttention (2023) to Reasoning Model Bottlenecks (2026) 🧠LLMs

medium.com·2d

Inference Arena – new benchmark of local inference and training 📊AI Models

kvark.github.io·4d·Hacker News

How to achieve P90 sub-microsecond latency in a C++ FIX engine 🌐Distributed Systems

akinocal1.substack.com·6h·Substack

We Put a Gaming Box in the Inference Loop 📊AI Models

write.as·1d

Prediction: The "Inference Supercycle" Could Be Bigger Than the Training Boom. 1 Growth Stock to Own. 📊AI Models

finance.yahoo.com·10h

benchmarking inference of popular models on consumer hardware 🔧MLOps

inferena.tech·5d·Hacker News

Building the Blueprint for Premium Inference 📊AI Models

sambanova.ai·1d

Reducing P999 Latency in Distributed Databases with TiDB 8.5 🗄️Databases

pingcap.com·15h

UCCL-EP: Portable Expert-Parallel Communication 🔧MLOps

uccl-project.github.io·2d·Hacker News

Inside the LLM Black Box: The True Architecture of Latency and Cost 🧠LLMs

akanuri.medium.com·5d

Loading more...