LLM Inference

Feeds to Scour
SubscribedAll
Scoured 142 posts in 44.1 ms

llama.cpp vs. vLLM: Choosing the right local LLM inference engine

 KV Cache
developers.redhat.com··Covers 7 stories

67% Cost Savings with PD Disaggregation Using Ray and vLLM on AMD MI325X

 KV Cache  Content type: Blog
anyscale.com··Hacker News

AI Inference at the Edge: Running Real-Time LLMs in Kubernetes Without a GPU Farm

 KV Cache  Content type: Blog

JetFlow: Breaking the Scaling Ceiling of Speculative Decoding with Parallel Tree Drafting

 KV Cache  Content type: Academic
arxiv.org·

Parallelize speculative decoding with P-EAGLE on Amazon SageMaker AI

 KV Cache  Content type: Blog
aws.amazon.com·

ahwurm/localharness: Model-agnostic agent harness for local LLMs — configure agents in YAML and run them on your own hardware (vLLM, Ollama, LM Studio, llama.cpp).

 KV Cache  Content type: Code
github.com··Hacker News·Covers: uv

Unlocking Extreme AMD Instinct Inference with Software-Hardware Co-Optimization

 💻Software Engineering  Content type: Blog

Most people use Ollama or llama.cpp for local LLMs, but these are the tools I switch to when it gets serious

 KV Cache

Deploying NVIDIA Nemotron-3 Ultra 550B, with B200 GPUs, vLLM on Google Kubernetes Engine — Football…

 KV Cache  Content type: Blog
medium.com
·

RAG Observability with Langfuse, vLLM, and FAISS

 🔍RAG
pyimagesearch.com·

vLLM Internalised: The Mechanics of Modern LLM Inference

 KV Cache  Content type: Blog
medium.com
·
Less-relevant results

Speculative Decoding | LM Studio

 💬LLMs
lmstudio.ai·

Green AI: Speculative Decoding as an Environmental Necessity

 🤖AI Agents

EfficientRollout: System-Aware Self-Speculative Decoding for RL Rollouts

 🔧MLOps  Content type: Academic
arxiv.org·

A brief history of KV cache compression developments

 KV Cache  Content type: Blog

All sorts of famous Attention Layers

 💬LLMs  Content type: Blog

Keyboard Shortcuts

Navigation

Next / previous item
j/k
Open post
oorEnter
Preview post
v

Post Actions

Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s

Recommendations

Add interest / feed
Enter
Not interested
x

Go to

Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/

General

Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help