Skip to main content
Scour
Discover
Docs
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Serving
⚡ LLM Serving
Specific
LLM inference, vLLM, model serving, TensorRT-LLM
Filter Results
Timeframe
Choose a timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
186
posts in
14.3
ms
⚡
LLM Optimization
arXiv
·
1d
1 day ago
CrossPool: Efficient
Multi-LLM
Serving
for Cold MoE
Models
through KV-Cache and Weight Disaggregation
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for CrossPool: Efficient Multi-LLM Serving for Cold MoE Models through KV-Cache and Weight Disaggregation
⚡
LLM Optimization
fitservers.com
·
6d
6 days ago
The Complete Guide to Deploying DeepSeek R1 on a Dedicated Server
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Complete Guide to Deploying DeepSeek R1 on a Dedicated Server
🧠
LLM Tooling
GitHub
·
23h
23 hours ago
For users with 4x-8x 6000 PROs, how is your experience with bigger
models
lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro)
Discussed on
r/LocalLLaMA
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for For users with 4x-8x 6000 PROs, how is your experience with bigger models lately? (GLM 5.2, Kimi 2.7, DeepSeek V4 Pro)
🧠
LLM Tooling
IT之家
·
2h
2 hours ago
华为与湖北移动完成全国运营商首个 AI 推理加速方案现网测试,长序列 Token 吞吐率提升 372%
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 华为与湖北移动完成全国运营商首个 AI 推理加速方案现网测试,长序列 Token 吞吐率提升 372%
🤖
LLM, Agent
lemmy.ml
·
1d
1 day ago
DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic
LLM
Inference
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for DualPath: Breaking the Storage Bandwidth Bottleneck in Agentic LLM Inference
🤖
AI
Hugging Face
·
1h
1 hour ago
Run a
vLLM
Server on HF Jobs in One Command
Covers
2 stories
See all stories this covers
including
Pi.dev: There are many coding agents, but this one is mine
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Run a vLLM Server on HF Jobs in One Command
🔧
Tool Use
medium.com
·
3d
3 days ago
Debugging
Deployments
with Gemma 12B, TPU v6e-1, MCP, and Antigravity CLI
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Debugging Deployments with Gemma 12B, TPU v6e-1, MCP, and Antigravity CLI
⚡
LLM Optimization
medium.com
·
1d
1 day ago
The Hidden Memory Problem Behind Fast
LLM
Inference
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Hidden Memory Problem Behind Fast LLM Inference
⚙️
AI Engineering
Red Hat Developer
·
2d
2 days ago
Optimizing distributed AI
inference
: Advanced
deployment
patterns
Covers
3 stories
See all stories this covers
including
DeepSeek-V3 Technical Report
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Optimizing distributed AI inference: Advanced deployment patterns
🧠
LLM Reasoning
medium.com
·
5d
5 days ago
vLLM
, Function Calling, and World
Models
explained
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for vLLM, Function Calling, and World Models explained
🧠
LLM Tooling
GitHub
·
14h
14 hours ago
Show HN: ParseHawk – 100% Local Document AI with API, CLI, and Web UI
Covers
2 stories
See all stories this covers
including
Installation
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Show HN: ParseHawk – 100% Local Document AI with API, CLI, and Web UI
🗣️
Large Language Models
blog.skypilot.co
·
2d
2 days ago
SkyPilot Endpoints: Production-Ready
Inference
on Every Cluster You Own
Discussed on
Hacker News
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for SkyPilot Endpoints: Production-Ready Inference on Every Cluster You Own
Less-relevant results
🧠
LLM Tooling
vucense.com
·
4h
4 hours ago
TurboQuant on Windows and LM Studio 2026: Complete Setup Guide
Covers
2 stories
See all stories this covers
including
Discover and run local LLMs
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for TurboQuant on Windows and LM Studio 2026: Complete Setup Guide
⚙️
AI Engineering
blocksandfiles
·
1d
1 day ago
DDN launches faster array HW and
KV
Cache
SW for AI
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for DDN launches faster array HW and KV Cache SW for AI
🧠
LLM Tooling
David Noel Ng
·
4d
4 days ago
2x GH200 for
LLM
inference
, Part 3: GLM-5.2, expert offload, and the CPU question
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for 2x GH200 for LLM inference, Part 3: GLM-5.2, expert offload, and the CPU question
🧠
LLM Tooling
primeintellect.ai
·
2d
2 days ago
RL at 1T Scale: prime-rl Performance Deep Dive
Covers
6 stories
See all stories this covers
including
Kimi K2.7-Code: open-source coding model with better token efficiency
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for RL at 1T Scale: prime-rl Performance Deep Dive
🤖
AI Development
Vik's Newsletter
·
15h
15 hours ago
What AI
Inference
Actually Demands From a NAND SSD
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for What AI Inference Actually Demands From a NAND SSD
🔄
AI Workflows
medium.com
·
5d
5 days ago
The Context Budget That Will Decide Everyday AI
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for The Context Budget That Will Decide Everyday AI
🧠
Agent Memory
medium.com
·
1d
1 day ago
PolyKV: We Gave 15 AI Agents One Shared Memory and It Actually Worked
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for PolyKV: We Gave 15 AI Agents One Shared Memory and It Actually Worked
💭
Context Management
medium.com
·
1d
1 day ago
Inside TurboQuant: The Algorithmic Breakthrough Smashing
LLM
Memory Walls
Love
Like
Not for me
Save
See related topics
Feeds
Share
Report
Off Topic
Harmful Content
Low Quality
Spam
Misleading
Duplicate
Wrong Language
Block Domain
Actions for Inside TurboQuant: The Algorithmic Breakthrough Smashing LLM Memory Walls
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous post
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Discover
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Not for me
Report