Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
⚡ Inference
LLM inference, model serving, vLLM, TensorRT, latency
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
147300
posts in
11.7
ms
LLM
inference
engine from
scratch
in C++
🧠
LLMs
anirudhsathiya.com
·
4d
·
Hacker News
Scheduling the
Unschedulable
:
Taming
Black-Box LLM Inference at Scale
🔧
MLOps
arxiv.org
·
1d
The Engine Behind Modern LLM Inference, Part 1: Continuous
Batching
,
PagedAttention
, and the End of…
🧠
LLMs
medium.com
·
11h
Inside LLM Inference: KV Cache,
Prefill
, and the
Decode
Bottleneck
🧠
LLMs
pub.towardsai.net
·
23h
RetroInfer
: A
Vector
Storage Engine for Scalable Long-Context LLM Inference
🧠
LLMs
vldb.org
·
1d
Overcoming
inference
challenges
📊
AI Models
redhat.com
·
3d
milanm/AutoGrad-Engine
: A complete GPT language model (training and inference) in ~600 lines of pure C#, zero dependencies
🧠
LLMs
github.com
·
13h
·
Hacker News
The case for Model-as-a-Service over
self-managed
inference
🔧
MLOps
news.ycombinator.com
·
2d
·
Hacker News
Tech links of April 2026
🌐
Distributed Systems
codeyarns.com
·
52m
I Ran My
KYB
Engine at Three
Quantization
Levels. Accuracy Didn't Move. Cost Dropped 6x.
🔧
MLOps
walsenburgtech.com
·
11h
·
Hacker News
KV Cache in LLM Inference: From
PagedAttention
(2023) to Reasoning Model
Bottlenecks
(2026)
🧠
LLMs
medium.com
·
2d
Inference
Arena
– new
benchmark
of local inference and training
📊
AI Models
kvark.github.io
·
4d
·
Hacker News
How to achieve
P90
sub-microsecond
latency in a C++ FIX engine
🌐
Distributed Systems
akinocal1.substack.com
·
6h
·
Substack
We Put a Gaming Box in the
Inference
Loop
📊
AI Models
write.as
·
1d
Prediction: The "Inference
Supercycle
" Could Be
Bigger
Than the Training Boom. 1 Growth Stock to Own.
📊
AI Models
finance.yahoo.com
·
10h
benchmarking
inference
of popular models on consumer hardware
🔧
MLOps
inferena.tech
·
5d
·
Hacker News
Building the
Blueprint
for Premium
Inference
📊
AI Models
sambanova.ai
·
1d
Reducing
P999
Latency in Distributed Databases with
TiDB
8.5
🗄️
Databases
pingcap.com
·
15h
UCCL-EP
: Portable
Expert-Parallel
Communication
🔧
MLOps
uccl-project.github.io
·
2d
·
Hacker News
Inside the LLM Black Box: The
True
Architecture of
Latency
and Cost
🧠
LLMs
akanuri.medium.com
·
5d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help