Distributed LLM Inference with LLM-d (opens in new tab)
An introduction to llm-d, an open-source LLM-aware router that intelligently schedules requests across inference engines like vLLM using KV cache locality and GPU utilization.
Read the original article