Skip to main content
Scour
Discover
Docs
Login
Sign Up
Discover
About
Docs
Changelog
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Serving
⚡ LLM Serving
Specific
LLM inference, vLLM, model serving, TensorRT-LLM
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
214
posts in
21.2
ms
🔬
Deep Learning
GitHub
·
3d
3 days ago
I got tired of not understanding how
vLLM
works under the hood, so I built my own mini
inference
engine from scratch.
Discussed on
r/LLM
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for I got tired of not understanding how vLLM works under the hood, so I built my own mini inference engine from scratch.
🔬
Deep Learning
ubuntu.com
·
1d
1 day ago
Developing web apps with local
LLM
inference
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Developing web apps with local LLM inference
🖥️
GPU Computing
Red Hat Developer
·
22h
22 hours ago
Designing distributed AI
inference
: Core concepts and scaling dimensions
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Designing distributed AI inference: Core concepts and scaling dimensions
🖥️
GPU Computing
medium.com
·
19h
19 hours ago
Debugging Deployments with Gemma 12B, TPU v6e-1, MCP, and Antigravity CLI
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Debugging Deployments with Gemma 12B, TPU v6e-1, MCP, and Antigravity CLI
🤖
AI Agents
medium.com
·
2d
2 days ago
vLLM
, Function Calling, and World
Models
explained
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for vLLM, Function Calling, and World Models explained
🏗️
Systems Design
Anyscale blog posts
·
4d
4 days ago
High Performance Distributed
Inference
with Ray
Serve
LLM
Covered by
Google Cloud Blog
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for High Performance Distributed Inference with Ray Serve LLM
📈
LLM Scaling
lmsys.org
·
6d
6 days ago
DFlash and
Spec
V2
Decoding
(14 minute read)
Covers
6 stories
See all stories this covers
including
Looking for a self-hosted alternative to Modal.com for running ML workloads
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for DFlash and Spec V2 Decoding (14 minute read)
📊
Machine Learning
Gradient Ascent
·
1d
1 day ago
Groq on Endless Compute, Inside Claude's Mind, and GLM-5.2 Open Weights - The Tokenizer Edition #32
Covers
3 stories
See all stories this covers
including
alibaba/open-code-review: Battle-tested at Alibaba's scale. Hybrid architecture code review tool: deterministic pipelines + LLM Agent, precise line-level comments, built-in fine-tuned ruleset (NPE, thread-safety, XSS, SQL injection), OpenAI & Anthropic compatible.
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Groq on Endless Compute, Inside Claude's Mind, and GLM-5.2 Open Weights - The Tokenizer Edition #32
Less-relevant results
🖥️
GPU Computing
graphsignal.com
·
22h
22 hours ago
CUDA Profiler for Production
Inference
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for CUDA Profiler for Production Inference
⚙️
MLOps
thecybersidekick.beehiiv.com
·
4d
4 days ago
AI
Inference
at the Edge: Running Real-Time LLMs in Kubernetes Without a GPU Farm
Discussed on
DEV
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for AI Inference at the Edge: Running Real-Time LLMs in Kubernetes Without a GPU Farm
🤖
AI Agents
medium.com
·
2d
2 days ago
The Context Budget That Will Decide Everyday AI
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Context Budget That Will Decide Everyday AI
🗄️
Vector Databases
moorcheh.ai
·
8h
8 hours ago
Information-Theoretic Vector Search Is Having Its Moment
Covered by
GitHub
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Information-Theoretic Vector Search Is Having Its Moment
🖥️
GPU Computing
arxiv.org
·
6d
6 days ago
SwiftCache: Efficient
LLM
Serving
for Multi-turn Conversations with Heterogeneous
KV
Cache Sharing
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for SwiftCache: Efficient LLM Serving for Multi-turn Conversations with Heterogeneous KV Cache Sharing
🏗️
Systems Design
Google Cloud Blog
·
4d
4 days ago
Scaling Ray
Serve
LLM
on GKE: Performance without losing the developer experience
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Scaling Ray Serve LLM on GKE: Performance without losing the developer experience
🧠
Transformer Architecture
whyopensource.ai
·
7h
7 hours ago
A running list of reasons to move to open source
Covers
3 stories
See all stories this covers
including
Statement on the US government directive to suspend access to Fable 5 and Mythos 5
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for A running list of reasons to move to open source
🔬
Deep Learning
youtube.com
Content type:
Video
·
4d
4 days ago
Token Injection: Crashing
LLM
Inference
With Special Tokens
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Token Injection: Crashing LLM Inference With Special Tokens
📈
LLM Scaling
portal.neuralwatt.com
·
1d
1 day ago
Neuralwatt: Energy-based pricing for AI
inference
. Efficient prompts cost less
Discussed on
Hacker News
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Neuralwatt: Energy-based pricing for AI inference. Efficient prompts cost less
🧠
Transformer Architecture
fitservers.com
·
3d
3 days ago
The Complete Guide to Deploying DeepSeek R1 on a Dedicated Server
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Complete Guide to Deploying DeepSeek R1 on a Dedicated Server
✍️
Prompt Engineering
pi.dev
·
2d
2 days ago
Pi 0.79.9
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Pi 0.79.9
🏗️
Systems Design
Blocks and Files
·
10h
10 hours ago
Dell and data physics
Love
Like
Not for me
Save
Add to your feed
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Dell and data physics
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report