Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🤖 LLM Inference
Specific
Model Serving, Quantization, vLLM, ONNX Runtime
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
170929
posts in
13.2
ms
Flow-Controlled
Scheduling
for LLM Inference with
Provable
Stability Guarantees
⚡
Inference Optimization
arxiv.org
·
1d
RetroInfer
: A
Vector
Storage Engine for Scalable Long-Context LLM Inference
⚡
Inference Optimization
vldb.org
·
6d
Introducing
dotLLM
- Building an LLM
Inference
Engine in C#
⚙️
AI Infrastructure
kokosa.dev
·
13h
·
Hacker News
I-DLM
:
Introspective
Diffusion Language Models
👁️
Multimodal LLMs
introspective-diffusion.github.io
·
21h
·
Hacker News
,
r/LocalLLaMA
amitshekhariitbhu/llm-internals
: Learn LLM
internals
step by step - from tokenization to attention to inference optimization.
👁️
Multimodal LLMs
github.com
·
1d
·
Hacker News
AMD makes a big splash with the
MI355X
in
MLPerf
Inference 6.0: Over one million tokens per second in multi-node inference
⚡
Inference Optimization
igorslab.de
·
1h
The Engine Behind Modern LLM Inference, Part 1: Continuous
Batching
,
PagedAttention
, and the End of…
⚡
Inference Optimization
medium.com
·
5d
Stop
benchmarking
inference
providers
, a guide to easy evaluation
⚡
Inference Optimization
huggingface.co
·
14h
·
r/LocalLLaMA
Three
AIs
enter. One survives. What a
SIGKILL
race reveals about inference speed
⚙️
AI Infrastructure
cline.ghost.io
·
1d
MiniLM-L6-v2
on the JVM:
⚡
Inference Optimization
medium.com
·
5h
Four Reasons Why
FPGAs
Hit the
Sweet
Spot for LLM Inference
⚡
Inference Optimization
pub.towardsai.net
·
14h
OxiBonsai
: The World’s First
Pure
Rust 1-Bit LLM Inference Engine
⚡
Inference Optimization
kitasanio.medium.com
·
2d
Quantization
,
LoRA
, and the 8% Problem: Benchmarking Local LLMs for Production AI
⚙️
AI Infrastructure
walsenburgtech.com
·
3d
·
Hacker News
Model API Performance
⚡
Inference Optimization
news.ycombinator.com
·
19h
·
Hacker News
Google Released
Gemma
4 with a Focus On Local-First, On-Device AI
Inference
⚙️
AI Infrastructure
infoq.com
·
1d
The Global
Optimum
: An In-Depth Look at
TurboQuant
and KV Cache Compression
⚡
Inference Optimization
thegradientdescent.medium.com
·
7h
Inside the Token Factory: A First-Principles Comparison of
vLLM
and
SGLang
⚙️
AI Infrastructure
hxu296.github.io
·
3d
·
Hacker News
Beyond
Helpfulness
: Specialized Fine-Tuning for Empathetic AI with Gemma 2B and
QLoRA
⚙️
AI Infrastructure
ecorbari.medium.com
·
2d
Taalas
bets on hard‑
wired
models to beat GPUs at inference
⚙️
AI Infrastructure
jonpeddie.com
·
11h
From AGI to LLMs and hallucinations:
unpacking
confusing
AI terms
⚙️
AI Infrastructure
digitaltoday.co.kr
·
1d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help