Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🚀 LLM Deployment
Specific
model serving, inference optimization, quantization, vLLM
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
302
posts in
11.4
ms
The
Inference
Bottleneck: Architecting Kubernetes Autoscaling for Production LLMs
💻
Local AI
cloudnativenow.com
·
5d
LLM
Inference
🧠
LLMs
iop.systems
·
2h
OSCAR: Offline Spectral Covariance-Aware Rotation for
2-bit
KV
Cache
Quantization
⚡
Quantization
arxiv.org
·
2d
I tried 4
LLM
speedup techniques on CPU. Three made it slower.
🎯
LLM Finetuning
deemwar-products.github.io
·
10h
·
Hacker News
InferenceBench
: A Benchmark for Open-Ended Inference
Optimization
by AI Agents
💻
Local AI
inferencebench.ai
·
5h
·
Hacker News
SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for
LLM
Inference
on Superchips
🎯
LLM Finetuning
supercomputing-system-ai-lab.github.io
·
2d
·
Hacker News
I've updated my glorified
Llama
fork (
LLM
Inference
Server) for P40's to utilise MTP + TurboQuant + DFlash
⚡
Quantization
github.com
·
4d
·
r/LocalLLaMA
GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU
⚡
Quantization
theahmadosman.substack.com
·
7h
·
Substack
,
r/LocalLLaMA
Coding Agent
Inference
Benchmark Revealed
💻
Local AI
startuphub.ai
·
1d
Four-Tier Memory Hierarchy for
LLM
Reasoning (USC, UW)
💻
Local AI
semiengineering.com
·
11h
KV
Cache
Optimization
: 3x Faster LLM Inference on 24GB VRAM
⚡
Quantization
tildalice.io
·
6d
Building a Controllable
Inference
Platform on Kubernetes with AI Runway
💻
Local AI
techcommunity.microsoft.com
·
2d
Command A+: Making sovereign agentic capabilities available to all
🤖
AI Agents
cohere.com
·
12h
·
Hacker News
Understanding
KV
Cache
: The Hidden Memory Cost of
Serving
LLMs
⚡
Quantization
melchi.me
·
1d
·
Hacker News
Let AI Agents Write Your
Serving
Stack with VibeServe
💻
Local AI
syfi.cs.washington.edu
·
6d
·
Hacker News
Intel
llm-scaler-vllm
PV 1.4 Released With Updated Components, Arc Pro B70 Support
🔬
Small LMs
phoronix.com
·
18h
KV
Cache
Is Becoming the Memory Hierarchy of
Inference
⚡
Quantization
touchdown-labs.com
·
2d
CohereLabs/command-a-plus-05-2026-bf16
💻
Local AI
huggingface.co
·
13h
·
r/LocalLLaMA
Build a Production-Grade Local
LLM
Stack (
vLLM
+ CUDA +
KV
Cache Tuning)
🎯
LLM Finetuning
medium.com
·
5d
Local LLMs are ready for real work
🎯
LLM Finetuning
thelurkreport.beehiiv.com
·
2d
·
r/LocalLLaMA
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help