Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
⚙️ Inference
model inference, serving, quantization, throughput, vLLM
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
149456
posts in
11.4
ms
Overcoming
inference
challenges
🔀
LoRA
redhat.com
·
3d
Scheduling the
Unschedulable
:
Taming
Black-Box LLM Inference at Scale
🚀
MLOps
arxiv.org
·
1d
The Engine Behind Modern LLM Inference, Part 1: Continuous
Batching
,
PagedAttention
, and the End of…
💬
LLMs
medium.com
·
15h
F&S M.2 AI Accelerator Uses
NXP
Ara-240
for Edge Inference Workloads
📊
AI Evals
linuxgizmos.com
·
4h
Inside LLM Inference: KV Cache,
Prefill
, and the
Decode
Bottleneck
💬
LLMs
pub.towardsai.net
·
1d
milanm/AutoGrad-Engine
: A complete GPT language model (training and inference) in ~600 lines of pure C#, zero dependencies
💬
LLMs
github.com
·
17h
·
Hacker News
LLM
inference
engine from
scratch
in C++
💬
LLMs
anirudhsathiya.com
·
4d
·
Hacker News
I Ran My
KYB
Engine at Three
Quantization
Levels. Accuracy Didn't Move. Cost Dropped 6x.
📊
AI Evals
walsenburgtech.com
·
15h
·
Hacker News
We Put a Gaming Box in the
Inference
Loop
📊
AI Evals
write.as
·
2d
Prediction: The "Inference
Supercycle
" Could Be
Bigger
Than the Training Boom. 1 Growth Stock to Own.
🔀
LoRA
finance.yahoo.com
·
15h
RetroInfer
: A
Vector
Storage Engine for Scalable Long-Context LLM Inference
💬
LLMs
vldb.org
·
1d
Inference
Arena
– new
benchmark
of local inference and training
🚀
MLOps
kvark.github.io
·
4d
·
Hacker News
New course: Efficient Inference with SGLang: Text and Image Generation, built in partnership with LMSys @
lmsysorg
and
RadixArk
@
radixark
, and taught by Richard ...
💬
LLMs
twitter.macworks.dev
·
15h
Building the
Blueprint
for Premium
Inference
📊
AI Evals
sambanova.ai
·
1d
How to achieve
P90
sub-microsecond
latency in a C++ FIX engine
🎯
Fine-Tuning
akinocal1.substack.com
·
11h
·
Substack
The case for Model-as-a-Service over
self-managed
inference
🚀
MLOps
news.ycombinator.com
·
3d
·
Hacker News
Attn-QAT
: Making 4-Bit Attention Actually Work
🎯
Fine-Tuning
haoailab.com
·
1d
Meta’s
Muse
Spark: a smaller, faster AI model for
broad
app deployment
📊
AI Evals
infoworld.com
·
16h
UCCL-EP
: Portable
Expert-Parallel
Communication
🚀
MLOps
uccl-project.github.io
·
2d
·
Hacker News
TurboQuant
Is Quietly
Solving
LLM Inference’s Worst Memory Problem
💬
LLMs
medium.com
·
5d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help