🚀 LLM Deployment - ibrahimsharaf · Scour

The Inference Bottleneck: Architecting Kubernetes Autoscaling for Production LLMs 💻Local AI

cloudnativenow.com·5d

LLM Inference 🧠LLMs

iop.systems·2h

OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization ⚡Quantization

I tried 4 LLM speedup techniques on CPU. Three made it slower. 🎯LLM Finetuning

deemwar-products.github.io·10h·Hacker News

InferenceBench: A Benchmark for Open-Ended Inference Optimization by AI Agents 💻Local AI

inferencebench.ai·5h·Hacker News

SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips 🎯LLM Finetuning

supercomputing-system-ai-lab.github.io·2d·Hacker News

I've updated my glorified Llama fork (LLM Inference Server) for P40's to utilise MTP + TurboQuant + DFlash ⚡Quantization

github.com·4d·r/LocalLLaMA

GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU ⚡Quantization

theahmadosman.substack.com·7h·Substack, r/LocalLLaMA

Coding Agent Inference Benchmark Revealed 💻Local AI

startuphub.ai·1d

Four-Tier Memory Hierarchy for LLM Reasoning (USC, UW) 💻Local AI

semiengineering.com·11h

KV Cache Optimization: 3x Faster LLM Inference on 24GB VRAM ⚡Quantization

tildalice.io·6d

Building a Controllable Inference Platform on Kubernetes with AI Runway 💻Local AI

techcommunity.microsoft.com·2d

Command A+: Making sovereign agentic capabilities available to all 🤖AI Agents

cohere.com·12h·Hacker News

Understanding KV Cache: The Hidden Memory Cost of Serving LLMs ⚡Quantization

melchi.me·1d·Hacker News

Let AI Agents Write Your Serving Stack with VibeServe 💻Local AI

syfi.cs.washington.edu·6d·Hacker News

Intel llm-scaler-vllm PV 1.4 Released With Updated Components, Arc Pro B70 Support 🔬Small LMs

phoronix.com·18h

KV Cache Is Becoming the Memory Hierarchy of Inference ⚡Quantization

touchdown-labs.com·2d

CohereLabs/command-a-plus-05-2026-bf16 💻Local AI

huggingface.co·13h·r/LocalLLaMA

Build a Production-Grade Local LLM Stack (vLLM + CUDA + KV Cache Tuning) 🎯LLM Finetuning

·5d

Local LLMs are ready for real work 🎯LLM Finetuning

thelurkreport.beehiiv.com·2d·r/LocalLLaMA

Log in to enable infinite scrolling