Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🤖 LLM Inference
Specific
Model Serving, Quantization, vLLM, ONNX Runtime
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
358
posts in
10.6
ms
I built a catalog of portable AI capability packs for coding agents. Is this useful or too abstract?
🤖
AI
doramagic.ai
·
16h
·
r/SideProject
Building a Controllable
Inference
Platform on Kubernetes with AI Runway
🤖
AI
techcommunity.microsoft.com
·
2d
Qwen’s MTP test puts local AI back in startup math
🦙
llama.cpp
startupfortune.com
·
5d
Intel
llm-scaler-vllm
PV 1.4 Released With Updated Components, Arc Pro B70 Support
🦙
llama.cpp
phoronix.com
·
18h
DeepSeek V4 Flash: Bringing Frontier AI to the Home
🦙
llama.cpp
blog.jonathanpage.com
·
2d
·
Hacker News
Let AI Agents Write Your
Serving
Stack with VibeServe
🦙
llama.cpp
syfi.cs.washington.edu
·
6d
·
Hacker News
CohereLabs/command-a-plus-05-2026-bf16
🦙
llama.cpp
huggingface.co
·
13h
·
r/LocalLLaMA
Eliminate
LLM
Cold starts: Load
models
up to 6x Faster with Azure Blob Storage and
Run
:AI Model Streamer
🦙
llama.cpp
devblogs.microsoft.com
·
1d
Best Local LLMs for Mac in 2026 — M1, M2, M3, M4 Tested
🧠
Memory Allocators
insiderllm.com
·
4d
Build
real-time
voice applications with Amazon SageMaker AI and
vLLM
🤖
AI
aws.amazon.com
·
11h
Ollama vs
vLLM
vs llama.cpp: Which Wins for Your Use Case
🦙
llama.cpp
tildalice.io
·
5d
Snowflake
Batch
Inference
at Scale with SPCS and Ray
🦙
llama.cpp
snowflake.com
·
2d
Local LLMs are ready for real work
🦙
llama.cpp
thelurkreport.beehiiv.com
·
2d
·
r/LocalLLaMA
Cerebras says its chips
run
a trillion-parameter AI
model
nearly 7
times
faster than GPU clouds
🦙
llama.cpp
venturebeat.com
·
9h
Discover the Red Hat OpenShift AI
model
catalog
🐯
TigerBeetle
redhat.com
·
3d
VeriCache: Turning Lossy
KV
Cache
into Lossless
LLM
Inference
🦙
llama.cpp
arxiv.org
·
2d
not much happened today
🦙
llama.cpp
news.smol.ai
·
5d
Cohere cracks lossless
quantization
and native citations with first full Apache 2.0 licensed open
model
Command A+
⚙️
Zig
venturebeat.com
·
7h
Cerebras: The $56.4 Billion IPO Challenging NVIDIA’s Memory Wall
🧠
Memory Allocators
artificialintelligencemadesimple.com
·
2d
Build a Production-Grade Local
LLM
Stack (
vLLM
+ CUDA +
KV
Cache Tuning)
🦙
llama.cpp
medium.com
·
5d
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help