Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🚀 LLM Deployment
Specific
model serving, inference optimization, quantization, vLLM
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
302
posts in
16.4
ms
I built a catalog of portable AI capability packs for coding agents. Is this useful or too abstract?
🤖
AI Agents
doramagic.ai
·
15h
·
r/SideProject
How I Shipped an Autonomous Agentic System on a 2026
Serverless-GPU
Stack
⚡
Quantization
medium.com
·
2d
https://www.together.ai/blog/coding-agent-benchmarks
💻
Local AI
together.ai
·
5d
DeepSeek Agent Harness: Technical deep-dive & the open-source blueprint
🤖
AI Agents
dlcmh.github.io
·
2h
·
Hacker News
Snowflake Batch
Inference
at Scale with SPCS and
Ray
💻
Local AI
snowflake.com
·
2d
I replaced GitHub Copilot with a self-hosted AI and I won’t go back
🛡️
AI Safety
xda-developers.com
·
9h
What GPU kernels mean for your distributed
inference
💻
Local AI
developers.redhat.com
·
1d
Why Shrinking an AI
Model
Often Makes It More Useful
🏢
LLM Adoption
siliconopera.com
·
19h
Recent Developments in
LLM
Architectures:
KV
Sharing, mHC, and
Compressed
Attention
⚙️
Transformers
magazine.sebastianraschka.com
·
4d
·
Hacker News
,
Hacker News
,
Hacker News
,
r/LocalLLaMA
KV
Cache
and Flash Attention with interactive diagrams
⚡
Quantization
kvcache.cobanov.dev
·
9h
·
Hacker News
LLM
Observability with Self-Hosted Langfuse and
vLLM
💻
Local AI
pyimagesearch.com
·
2d
Ollama vs
vLLM
vs
llama.cpp
: Which Wins for Your Use Case
💻
Local AI
tildalice.io
·
5d
I built Mofakir: A native, local AI desktop assistant for Linux that actually interacts with your system
💻
Local AI
github.com
·
5h
·
r/linux
Multi-Token Prediction (MTP)
🧠
LLMs
sebastianraschka.com
·
1d
Qwen’s MTP test puts local AI back in startup math
💻
Local AI
startupfortune.com
·
5d
DeepSeek V4 Flash: Bringing Frontier AI to the Home
⚡
Quantization
blog.jonathanpage.com
·
2d
·
Hacker News
ROCm 7 on Strix Halo: Benchmarking the New Toolbox Images
🎯
LLM Finetuning
sleepingrobots.com
·
4d
VeriCache: Turning Lossy
KV
Cache
into Lossless
LLM
Inference
⚡
Quantization
arxiv.org
·
2d
How
LLM
Inference
Works
🧠
LLMs
arpitbhayani.me
·
6d
·
Hacker News
Eliminate
LLM
Cold starts: Load
models
up to 6x Faster with Azure Blob Storage and Run:AI Model Streamer
⚡
Quantization
devblogs.microsoft.com
·
1d
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help