Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Inference
🤖 LLM Inference
Specific
Model Serving, Quantization, vLLM, ONNX Runtime
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
201
posts in
15.0
ms
🤖
AI
GitHub
·
6d
6 days ago
I got tired of not understanding how
vLLM
works under the hood, so I built my own mini
inference
engine from scratch.
Discussed on
r/LLM
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for I got tired of not understanding how vLLM works under the hood, so I built my own mini inference engine from scratch.
🔮
Speculative Decoding
NVIDIA Technical Blog
·
2d
2 days ago
Boost
Inference
Performance up to 15x on NVIDIA Blackwell Using DFlash
Speculative
Decoding
Covers
4 stories
See all stories this covers
including
NVIDIA Blackwell Architecture
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding
⏱️
Latency Engineering
Phoronix
·
23h
23 hours ago
AMD Contributes
ONNX
Runtime
Backend To FFmpeg DNN Filter
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for AMD Contributes ONNX Runtime Backend To FFmpeg DNN Filter
✂️
Prefill Disaggregation
arXiv
·
5h
5 hours ago
SharQ: Bridging Activation Sparsity and FP4
Quantization
for
LLM
Inference
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for SharQ: Bridging Activation Sparsity and FP4 Quantization for LLM Inference
🤖
AI
Hugging Face
·
1d
1 day ago
Could you help me test MTP for GLM-4.7-Flash?
Discussed on
r/LocalLLaMA
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Could you help me test MTP for GLM-4.7-Flash?
🤖
AI
David Noel Ng
·
4d
4 days ago
2x GH200 for
LLM
inference
, Part 3: GLM-5.2, expert offload, and the CPU question
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 2x GH200 for LLM inference, Part 3: GLM-5.2, expert offload, and the CPU question
⚡
Flash Attention
vucense.com
·
13h
13 hours ago
TurboQuant on Windows and LM Studio 2026: Complete Setup Guide
Covers
2 stories
See all stories this covers
including
Discover and run local LLMs
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for TurboQuant on Windows and LM Studio 2026: Complete Setup Guide
✂️
Prefill Disaggregation
medium.com
·
1d
1 day ago
The Hidden Memory Problem Behind Fast
LLM
Inference
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Hidden Memory Problem Behind Fast LLM Inference
✂️
Prefill Disaggregation
fitservers.com
·
6d
6 days ago
The Complete Guide to Deploying DeepSeek R1 on a Dedicated
Server
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Complete Guide to Deploying DeepSeek R1 on a Dedicated Server
🔮
Speculative Decoding
Modal
·
2d
2 days ago
Achieve state-of-the-art
inference
latencies with
speculative
decoding
Covers
DFlash: Block Diffusion for Flash Speculative Decoding
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Achieve state-of-the-art inference latencies with speculative decoding
Less-relevant results
✂️
Prefill Disaggregation
nextbigfuture.com
·
17h
17 hours ago
Optimus Teslabot Would Be an Edge Computing Beast
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Optimus Teslabot Would Be an Edge Computing Beast
✂️
Prefill Disaggregation
medium.com
·
4d
4 days ago
Debugging Deployments with Gemma 12B, TPU v6e-1, MCP, and Antigravity CLI
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Debugging Deployments with Gemma 12B, TPU v6e-1, MCP, and Antigravity CLI
✂️
Prefill Disaggregation
lemmy.ml
·
1d
1 day ago
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic
LLM
Inference
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
✂️
Prefill Disaggregation
blog.skypilot.co
·
2d
2 days ago
SkyPilot Endpoints: Production-Ready
Inference
on Every Cluster You Own
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for SkyPilot Endpoints: Production-Ready Inference on Every Cluster You Own
🤖
Agentic AI
medium.com
·
6d
6 days ago
vLLM
, Function Calling, and World
Models
explained
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for vLLM, Function Calling, and World Models explained
✂️
Prefill Disaggregation
Red Hat Developer
·
2d
2 days ago
Optimizing distributed AI
inference
: Advanced deployment patterns
Covers
3 stories
See all stories this covers
including
DeepSeek-V3 Technical Report
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Optimizing distributed AI inference: Advanced deployment patterns
✂️
Prefill Disaggregation
Ubuntu
·
5d
5 days ago
Developing web apps with local
LLM
inference
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Developing web apps with local LLM inference
✂️
Prefill Disaggregation
OpenAI News
·
2d
2 days ago
OpenAI and Broadcom unveil
LLM-optimized
inference
chip
Covered by
Mark Smith's Blog Feed
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for OpenAI and Broadcom unveil LLM-optimized inference chip
🤖
AI
Hugging Face
·
9h
9 hours ago
Run
a
vLLM
Server
on HF Jobs in One Command
Covers
2 stories
See all stories this covers
including
Pi.dev: There are many coding agents, but this one is mine
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Run a vLLM Server on HF Jobs in One Command
✂️
Prefill Disaggregation
supercomputing-system-ai-lab.github.io
·
2d
2 days ago
VoltanaLLM: Energy-Efficient
LLM
Serving
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for VoltanaLLM: Energy-Efficient LLM Serving
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report