🤖 LLM Inference - anarcher · Scour

LLM Inference 🦙llama.cpp

iop.systems·1h

KV Cache Optimization: 3x Faster LLM Inference on 24GB VRAM 🦙llama.cpp

tildalice.io·6d

SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for LLM Inference on Superchips 🧠Memory Allocators

supercomputing-system-ai-lab.github.io·2d·Hacker News

InferenceBench: A Benchmark for Open-Ended Inference Optimization by AI Agents 🦙llama.cpp

inferencebench.ai·5h·Hacker News

Lever: Speculative LLM Inference on Smartphones 🦙llama.cpp

AMD says its $4K Ryzen AI Halo workstation practically pays for itself ⚙️Zig

theregister.com·16h

The Inference Bottleneck: Architecting Kubernetes Autoscaling for Production LLMs 🦙llama.cpp

cloudnativenow.com·5d

tvall43/Qwen3.5-14B-A3B-Claude-4.6-Opus-Reasoning-Distilled-reap-gguf at main 🦙llama.cpp

huggingface.co·17h·r/LocalLLaMA

Understanding KV Cache: The Hidden Memory Cost of Serving LLMs 🦙llama.cpp

melchi.me·1d·Hacker News

GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU 🧠Memory Allocators

theahmadosman.substack.com·7h·Substack, r/LocalLLaMA

Command A+: Making sovereign agentic capabilities available to all ⚙️Zig

cohere.com·12h·Hacker News

chiennv2000/orthrus: Fast, lossless LLM inference via dual-view diffusion decoding. 🦙llama.cpp

github.com·5d·Hacker News

Coding Agent Inference Benchmark Revealed 🤖AI

startuphub.ai·1d

I tried 4 LLM speedup techniques on CPU. Three made it slower. 🦙llama.cpp

deemwar-products.github.io·9h·Hacker News

KV Cache Is Becoming the Memory Hierarchy of Inference 🦙llama.cpp

touchdown-labs.com·2d

https://www.together.ai/blog/coding-agent-benchmarks 🦙llama.cpp

together.ai·5d

KV Cache and Flash Attention with interactive diagrams 🧠Memory Allocators

kvcache.cobanov.dev·9h·Hacker News

What GPU kernels mean for your distributed inference 🦙llama.cpp

developers.redhat.com·1d

How LLM Inference Works 🦙llama.cpp

arpitbhayani.me·6d·Hacker News

How I Shipped an Autonomous Agentic System on a 2026 Serverless-GPU Stack 🦙llama.cpp

·2d

Log in to enable infinite scrolling