Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🤖 LLM Inference
Specific
Model Serving, Quantization, vLLM, ONNX Runtime
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
190
posts in
6.6
ms
llm-inference-at-scale/content/00
_foundations/00.1_why_
llm
_
inference
_is_different/why_
llm
_
inference
_is_different.md at master ·
harshuljain13/llm-inference-at-scale
🔮
Speculative Decoding
github.com
·
1d
·
Hacker News
Benchmarking AI
inference
on CPUs: A transparent blueprint for the enterprise
⚙️
Performance Profiling
next.redhat.com
·
20h
Characterization of machine learning compilers for
LLM
inference
on NVIDIA GPUs
🔮
Speculative Decoding
link.springer.com
·
5d
·
Hacker News
Databricks’
Model
Units Redefine
LLM
Inference
Economics, But Can Reliability Scale?
🔮
Speculative Decoding
futurumgroup.com
·
5h
SiDP: Memory-Efficient Data Parallelism for Offline
LLM
Inference
🔮
Speculative Decoding
arxiv.org
·
1d
Local
LLM
Deployment
: Ollama vs
vLLM
vs LM Studio Compared
🪟
Tauri
sitepoint.com
·
23h
LoRA vs Adapter vs Prefix Tuning: PEFT Memory Comparison
🔮
Speculative Decoding
tildalice.io
·
6d
OpenCode Now Supports DigitalOcean
Inference
Router for
Intelligent
Model
Routing
📡
Edge AI
digitalocean.com
·
21h
Reliable
LLM
Inference
at Scale
📡
Edge AI
databricks.com
·
1d
Booming AI Revenues Boost
Inference
Startups to Decacorn Status
📡
Edge AI
newcomer.co
·
3h
Reachy Mini goes fully local
⏱️
Latency Engineering
huggingface.co
·
21h
·
Hacker News
The same 16 GPUs, twice the users:
Inference-aware
routing for
LLM
clusters
📡
Edge Computing
redhat.com
·
2d
Running LLMs locally on a Mac
⚙️
MLOps
danmackinlay.name
·
6d
Real-time
LLM
Inference
on Standard GPUs (3,000 tokens/s per request)
🔮
Speculative Decoding
blog.kog.ai
·
1d
·
Hacker News
,
Hacker News
Reinforcement Learning is an Infrastructure Problem
📊
Performance Tools
modal.com
·
18h
Nvidia Dynamo Snapshot: Fast Startup for
Inference
Workloads on Kubernetes
🚀
Performance
developer.nvidia.com
·
1d
·
Hacker News
Cohere Open-Sourced Command A+, a 218B MoE
Model
Built for Enterprise Agents
🎯
AI Agents
firethering.com
·
6d
·
Hacker News
Running AI
inference
on Rebellions ATOM NPU with Red Hat AI
🍓
Raspberry Pi Clusters
developers.redhat.com
·
2d
Global Fixed Point DSP Market Size, Industry Share, Trends & Forecast 2026-2034
💾
Embedded Systems
verifiedmarketreports.com
·
5d
PyTorch AOTInductor Hybrid Lowering
📊
Performance Tools
leimao.github.io
·
1d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help