Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Serving
⚡ LLM Serving
Specific
LLM inference, vLLM, model serving, TensorRT-LLM
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
215
posts in
18.8
ms
📈
LLM Scaling
arXiv
·
17h
17 hours ago
HERALD:
High-Throughput
Block Diffusion
LLM
Serving
via CPU-GPU Cooperative KV Cache Retrieval
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for HERALD: High-Throughput Block Diffusion LLM Serving via CPU-GPU Cooperative KV Cache Retrieval
Less-relevant results
🗄️
Vector Databases
moorcheh.ai
·
1d
1 day ago
Information-Theoretic Vector Search Is Having Its Moment
Covered by
GitHub
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Information-Theoretic Vector Search Is Having Its Moment
🤖
AI Agents
medium.com
·
3d
3 days ago
vLLM
, Function Calling, and World
Models
explained
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for vLLM, Function Calling, and World Models explained
🧠
Transformer Architecture
GitHub
·
5h
5 hours ago
I mapped the KLD of
KV
cache
quantization
for Qwen3.6-35B-A3B and Gemma4-E2B QAT
Discussed on
r/LocalLLaMA
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for I mapped the KLD of KV cache quantization for Qwen3.6-35B-A3B and Gemma4-E2B QAT
🔬
Deep Learning
mstar.stanford.edu
·
5d
5 days ago
M* (M-Star): A Modular, Extensible,
Serving
System for Multimodal
Models
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for M* (M-Star): A Modular, Extensible, Serving System for Multimodal Models
📈
LLM Scaling
venturebeat.com
·
1d
1 day ago
AI hit the memory wall — now it needs a new context tier
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for AI hit the memory wall — now it needs a new context tier
🖥️
GPU Computing
digitalocean.com
·
21h
21 hours ago
The HBM Tax: Why Vision Encoders and Language Decoders Fight Over Your GPU
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The HBM Tax: Why Vision Encoders and Language Decoders Fight Over Your GPU
🖥️
GPU Computing
Rack to Cloud
·
3h
3 hours ago
GPU Scarcity Isn't the Problem Anymore. GPU Allocation Governance Is.
Discussed on
DEV
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for GPU Scarcity Isn't the Problem Anymore. GPU Allocation Governance Is.
🧠
Transformer Architecture
whyopensource.ai
·
1d
1 day ago
A running list of reasons to move to open source
Covers
3 stories
See all stories this covers
including
Statement on the US government directive to suspend access to Fable 5 and Mythos 5
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for A running list of reasons to move to open source
🔬
Deep Learning
towardsdeeplearning.com
·
6d
6 days ago
Green AI:
Speculative
Decoding
as an Environmental Necessity
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Green AI: Speculative Decoding as an Environmental Necessity
📈
LLM Scaling
portal.neuralwatt.com
·
2d
2 days ago
Neuralwatt: Energy-based pricing for AI
inference
. Efficient prompts cost less
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Neuralwatt: Energy-based pricing for AI inference. Efficient prompts cost less
🏗️
Systems Design
Hugging Face
·
5d
5 days ago
225B-A23B
Covered by
mail.bycloud.ai
,
news.smol.ai
Discussed on
r/LocalLLaMA
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 225B-A23B
📈
LLM Scaling
arXiv
·
17h
17 hours ago
EnerInfer: Energy-Aware On-Device
LLM
Inference
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for EnerInfer: Energy-Aware On-Device LLM Inference
📈
LLM Scaling
medium.com
·
2d
2 days ago
One Number Lies: How to Actually Measure
LLM
Inference
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for One Number Lies: How to Actually Measure LLM Inference
🧠
Transformer Architecture
medium.com
·
5d
5 days ago
The Transformer Pipeline: A Complete Mathematical and Visual Guide
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Transformer Pipeline: A Complete Mathematical and Visual Guide
🧠
Transformer Architecture
Red Hat Developer
·
21h
21 hours ago
Connect EvalHub to protected production
model
servers
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Connect EvalHub to protected production model servers
📈
LLM Scaling
machine-learning-made-simple.medium.com
·
19h
19 hours ago
The Real Cost of Running AI: From FLOPs to GPUs to the
KV
Cache
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Real Cost of Running AI: From FLOPs to GPUs to the KV Cache
🔬
Deep Learning
abhishek.it
·
4d
4 days ago
Running GLM-5.2 5x faster at 500tps with limitation
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Running GLM-5.2 5x faster at 500tps with limitation
📊
Machine Learning
GitHub
·
2d
2 days ago
Show HN: Alloy – a PyTorch backend and
inference
engine for Apple Silicon
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Show HN: Alloy – a PyTorch backend and inference engine for Apple Silicon
🤖
AI Agents
medium.com
·
3d
3 days ago
The Context Budget That Will Decide Everyday AI
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Context Budget That Will Decide Everyday AI
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report