Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
LLM Inference
🧠 LLM Inference
Specific
LLM serving, inference optimization, token generation, vLLM
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
142
posts in
36.6
ms
vLLM
Transformers Backend: Bridging Hugging Face Compatibility and High-Performance
Inference
⚡
KV Cache
Content type:
Blog
odsc.medium.com
·
6d
6 days ago
Actions for vLLM Transformers Backend: Bridging Hugging Face Compatibility and High-Performance Inference
coder543/command-a-plus-05-2026-gguf
⚡
KV Cache
huggingface.co
·
3d
3 days ago
·
r/LocalLLaMA
·
Covers:
AlterLang InterCode: A Native Intercomprehension Paradigm in Programming, Powered by GuruDev
,
Command A+: Making sovereign agentic capabilities available to all
+1 more
Actions for coder543/command-a-plus-05-2026-gguf
Lemonade SDK Adds Nvidia CUDA Support
⚡
KV Cache
i-programmer.info
·
1d
1 day ago
·
Covers:
Show HN: Lemonade: Run LLMs Locally with GPU and NPU Acceleration
Actions for Lemonade SDK Adds Nvidia CUDA Support
Towards Scalable Customization and Deployment of Multi-Agent Systems for Enterprise Applications
🤖
AI Agents
Content type:
Academic
arxiv.org
·
12h
12 hours ago
Actions for Towards Scalable Customization and Deployment of Multi-Agent Systems for Enterprise Applications
Built Uber aggregator that tracks top AI researchers and leaders
💬
LLMs
brightray.ai
·
1d
1 day ago
·
Hacker News
Actions for Built Uber aggregator that tracks top AI researchers and leaders
Native Coding Agent
Optimized
for Local
LLM
and DeepSeek v4 with Vector Memory
🔍
RAG
code.intellios.ai
·
1d
1 day ago
·
Hacker News
·
Cited by 1 article
·
Covers:
I Improved 15 LLMs at Coding in One Afternoon. Only the Harness Changed.
Actions for Native Coding Agent Optimized for Local LLM and DeepSeek v4 with Vector Memory
How Zoho Labs pivoted to
inference
engineering
📄
ML Papers
yourstory.com
·
4d
4 days ago
Actions for How Zoho Labs pivoted to inference engineering
Story of How Im Running an Unlimited $6/Month AI Provider on 4x RTX 3090s
⚡
KV Cache
Content type:
Discussion
news.ycombinator.com
·
4d
4 days ago
·
Hacker News
Actions for Story of How Im Running an Unlimited $6/Month AI Provider on 4x RTX 3090s
Deploying NVIDIA Nemotron-3 Ultra 550B, with B200 GPUs,
vLLM
on Google Kubernetes Engine — Football…
⚡
KV Cache
Content type:
Blog
ammettw.medium.com
·
3d
3 days ago
Actions for Deploying NVIDIA Nemotron-3 Ultra 550B, with B200 GPUs, vLLM on Google Kubernetes Engine — Football…
llama.cpp
now supports model management (downloading etc) via API
🔧
MLOps
Content type:
Code
github.com
·
17h
17 hours ago
·
r/LocalLLaMA
Actions for llama.cpp now supports model management (downloading etc) via API
Solyx AI Grid: Hardware-Telemetry-Aware Routing Across Geographically Distributed GPU Clusters
⚡
KV Cache
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Solyx AI Grid: Hardware-Telemetry-Aware Routing Across Geographically Distributed GPU Clusters
Build Claude Alternative in Cloud in 20mins
⚡
KV Cache
Content type:
Reference
docs.dagploy.com
·
4d
4 days ago
·
Hacker News
·
Covers:
Qwen 3.6 27B is out
Actions for Build Claude Alternative in Cloud in 20mins
Linear Thinking, Nonlinear Costs
🤖
AI Agents
Content type:
Blog
oreilly.com
·
2d
2 days ago
Actions for Linear Thinking, Nonlinear Costs
CrankGPT is an offline AI box for the apocalypse
🤖
AI Agents
boingboing.net
·
2d
2 days ago
·
Cited by 1 article
·
Covers:
fully offline, human-powered local AI
Actions for CrankGPT is an offline AI box for the apocalypse
[AINews] Fable and Mythos officially too dangerous to release
⚡
KV Cache
Content type:
News
latent.space
·
5d
5 days ago
·
Covers:
Statement on the US government directive to suspend access to Fable 5 and Mythos 5
,
DietrichGebert/ponytail: Makes your AI agent think like the laziest senior dev in the room. The best code is the code you never wrote.
+2 more
Actions for [AINews] Fable and Mythos officially too dangerous to release
Mi50 32GB / GFX906 -
vLLM
Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit
⚡
KV Cache
huggingface.co
·
6d
6 days ago
·
r/LocalLLaMA
·
Covers:
vllm-project/vllm
,
sgl-project/sglang
+2 more
Actions for Mi50 32GB / GFX906 - vLLM Qwen 3.5 Configuration for Qwen 3.5:9B AWQ-4bit
RL Systems Mind the Gap: Matching Trainer and
Generator
Throughput
⚡
KV Cache
Content type:
News
newsletter.semianalysis.com
·
1d
1 day ago
·
Cited by 1 article
·
Covers:
GLM 5 is already on huggingface!
,
Dario Amodei — “We are near the end of the exponential”
+1 more
Actions for RL Systems Mind the Gap: Matching Trainer and Generator Throughput
Plug-and-Adapt: Multimodal Coreference Resolution at First Sight with a Pretrained Alignment Model
⚡
KV Cache
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Plug-and-Adapt: Multimodal Coreference Resolution at First Sight with a Pretrained Alignment Model
Making a fleet of self-hosted
LLM
agents trustworthy
🌐
Distributed Systems
Content type:
Blog
llmkube.com
·
4d
4 days ago
·
DEV
Actions for Making a fleet of self-hosted LLM agents trustworthy
Is anyone else not finding the Web UI on latest (b9680) of
llama.cpp
?
💬
LLMs
Content type:
Discussion
Content type:
Code
github.com
·
1d
1 day ago
·
r/LocalLLaMA
Actions for Is anyone else not finding the Web UI on latest (b9680) of llama.cpp?
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help
Like
Save
Dislike
Report