Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
🚀 LLM Deployment
Specific
model serving, inference optimization, quantization, vLLM
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
150778
posts in
11.1
ms
vLLM
introduces memory
optimizations
for long-context inference
💻
Local AI
github.com
·
5d
·
Hacker News
Initialisation
Determines the Basin: Efficient
Codebook
Optimisation for Extreme LLM Quantization
🎯
LLM Finetuning
arxiv.org
·
11h
UCCL-EP
: Portable
Expert-Parallel
Communication
💻
Local AI
uccl-project.github.io
·
2d
·
Hacker News
I Ran My
KYB
Engine at Three
Quantization
Levels. Accuracy Didn't Move. Cost Dropped 6x.
🎯
LLM Finetuning
walsenburgtech.com
·
22h
·
Hacker News
(123)
Hermes
Agent Local Ai Setup Guide with Qwen3.5 +
OpenWebUI
💻
Local AI
youtube.com
·
1d
How Do You Actually Scale
High-Throughput
LLM Serving in Production with
vLLM
?
🎯
LLM Finetuning
medium.com
·
4d
Friday Five — April 10, 2026
🏢
LLM Adoption
redhat.com
·
15h
Attn-QAT
: Making 4-Bit Attention Actually Work
🎯
LLM Finetuning
haoailab.com
·
1d
The case for Model-as-a-Service over
self-managed
inference
💻
Local AI
news.ycombinator.com
·
3d
·
Hacker News
Breaking the Memory Wall:
TurboQuant
KV
Cache Quantization on Apple Silicon
🎯
LLM Finetuning
pub.towardsai.net
·
1d
Dockerizing
ML Models: A Data Engineer’s Guide to Model
Serving
🧠
LLMs
medium.com
·
4d
From one Rust crate to an ecosystem
spanning
LangChain, PyTorch,
FAISS
, vLLM, 11 vector databases…
💻
Local AI
medium.com
·
4d
Dual-Pool Token-Budget
Routing
for Cost-Efficient and Reliable LLM
Serving
💻
Local AI
arxiv.org
·
11h
TurboQuant
: The Compression Algorithm That Just Made Your Vector Database
Obsolete
📐
Vector Search
danwichoudhary.medium.com
·
4d
[
RFC
]:
vLLM
IR: A Functional Intermediate Representation for
vLLM
· Issue #32358
🎯
LLM Finetuning
github.com
·
2d
·
Hacker News
Bit-by-Bit: Progressive
QAT
Strategy with
Outlier
Channel Splitting for Stable Low-Bit LLMs
🎯
LLM Finetuning
arxiv.org
·
11h
BerriAI/litellm
v1.82.3-stable.patch.4
🎯
LLM Finetuning
github.com
·
14h
How to Run ANY AI Model on Your Computer WITHOUT a GPU
🔓
Open Source AI
youtube.com
·
5d
Efficient
Quantization
of Mixture-of-Experts with
Theoretical
Generalization Guarantees
🧠
LLMs
arxiv.org
·
1d
QaRL
: Rollout-Aligned Quantization-Aware RL for Fast and Stable Training under Training--Inference
Mismatch
💻
Local AI
arxiv.org
·
11h
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help