Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🚀 LLM Deployment
Specific
model serving, inference optimization, quantization, vLLM
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
220
posts in
9.0
ms
I've updated my glorified
Llama
fork (
LLM
Inference
Server) for P40's to utilise MTP + TurboQuant + DFlash
⚡
Quantization
github.com
·
4d
·
r/LocalLLaMA
LLM
Inference
🧠
LLMs
iop.systems
·
1h
OSCAR: Offline Spectral Covariance-Aware Rotation for
2-bit
KV
Cache
Quantization
⚡
Quantization
arxiv.org
·
2d
I tried 4
LLM
speedup techniques on CPU. Three made it slower.
🎯
LLM Finetuning
deemwar-products.github.io
·
9h
·
Hacker News
InferenceBench
: A Benchmark for Open-Ended Inference
Optimization
by AI Agents
💻
Local AI
inferencebench.ai
·
4h
·
Hacker News
SuperInfer: SLO-Aware Rotary Scheduling and Memory Management for
LLM
Inference
on Superchips
🎯
LLM Finetuning
supercomputing-system-ai-lab.github.io
·
2d
·
Hacker News
GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU
⚡
Quantization
theahmadosman.substack.com
·
7h
·
Substack
,
r/LocalLLaMA
Recent Developments in
LLM
Architectures:
KV
Sharing, mHC, and
Compressed
Attention
⚙️
Transformers
magazine.sebastianraschka.com
·
4d
·
Hacker News
,
Hacker News
,
Hacker News
,
r/LocalLLaMA
Coding Agent
Inference
Benchmark Revealed
💻
Local AI
startuphub.ai
·
1d
Four-Tier Memory Hierarchy for
LLM
Reasoning (USC, UW)
💻
Local AI
semiengineering.com
·
10h
Command A+: Making sovereign agentic capabilities available to all
🤖
AI Agents
cohere.com
·
11h
·
Hacker News
Building a Controllable
Inference
Platform on Kubernetes with AI Runway
💻
Local AI
techcommunity.microsoft.com
·
2d
ROCm 7 on Strix Halo: Benchmarking the New Toolbox Images
🎯
LLM Finetuning
sleepingrobots.com
·
4d
Understanding
KV
Cache
: The Hidden Memory Cost of
Serving
LLMs
⚡
Quantization
melchi.me
·
1d
·
Hacker News
Intel
llm-scaler-vllm
PV 1.4 Released With Updated Components, Arc Pro B70 Support
🔬
Small LMs
phoronix.com
·
17h
KV
Cache
Is Becoming the Memory Hierarchy of
Inference
⚡
Quantization
touchdown-labs.com
·
2d
froggeric/Qwen3.6-27B-MTP-GGUF
⚡
Quantization
huggingface.co
·
3d
·
DEV
I built a catalog of portable AI capability packs for coding agents. Is this useful or too abstract?
🤖
AI Agents
doramagic.ai
·
15h
·
r/SideProject
Local LLMs are ready for real work
🎯
LLM Finetuning
thelurkreport.beehiiv.com
·
2d
·
r/LocalLLaMA
DeepSeek Agent Harness: Technical deep-dive & the open-source blueprint
🤖
AI Agents
dlcmh.github.io
·
2h
·
Hacker News
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help