🚀 LLM Deployment - ibrahimsharaf · Scour

I've updated my glorified Llama fork (LLM Inference Server) for P40's to utilise MTP + TurboQuant + DFlash ⚡Quantization

github.com·4d·r/LocalLLaMA

LLM Inference 🧠LLMs

iop.systems·1h

OSCAR: Offline Spectral Covariance-Aware Rotation for 2-bit KV Cache Quantization ⚡Quantization

I tried 4 LLM speedup techniques on CPU. Three made it slower. 🎯LLM Finetuning

deemwar-products.github.io·9h·Hacker News

InferenceBench: A Benchmark for Open-Ended Inference Optimization by AI Agents 💻Local AI

inferencebench.ai·4h·Hacker News

SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips 🎯LLM Finetuning

supercomputing-system-ai-lab.github.io·2d·Hacker News

GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU ⚡Quantization

theahmadosman.substack.com·7h·Substack, r/LocalLLaMA

Recent Developments in LLM Architectures: KV Sharing, mHC, and Compressed Attention ⚙️Transformers

magazine.sebastianraschka.com·4d·Hacker News, Hacker News, Hacker News, r/LocalLLaMA

Coding Agent Inference Benchmark Revealed 💻Local AI

startuphub.ai·1d

Four-Tier Memory Hierarchy for LLM Reasoning (USC, UW) 💻Local AI

semiengineering.com·10h

Command A+: Making sovereign agentic capabilities available to all 🤖AI Agents

cohere.com·11h·Hacker News

Building a Controllable Inference Platform on Kubernetes with AI Runway 💻Local AI

techcommunity.microsoft.com·2d

ROCm 7 on Strix Halo: Benchmarking the New Toolbox Images 🎯LLM Finetuning

sleepingrobots.com·4d

Understanding KV Cache: The Hidden Memory Cost of Serving LLMs ⚡Quantization

melchi.me·1d·Hacker News

Intel llm-scaler-vllm PV 1.4 Released With Updated Components, Arc Pro B70 Support 🔬Small LMs

phoronix.com·17h

KV Cache Is Becoming the Memory Hierarchy of Inference ⚡Quantization

touchdown-labs.com·2d

froggeric/Qwen3.6-27B-MTP-GGUF ⚡Quantization

huggingface.co·3d·DEV

I built a catalog of portable AI capability packs for coding agents. Is this useful or too abstract? 🤖AI Agents

doramagic.ai·15h·r/SideProject

Local LLMs are ready for real work 🎯LLM Finetuning

thelurkreport.beehiiv.com·2d·r/LocalLLaMA

DeepSeek Agent Harness: Technical deep-dive & the open-source blueprint 🤖AI Agents

dlcmh.github.io·2h·Hacker News

Log in to enable infinite scrolling