Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
🚀 LLM Deployment
Specific
model serving, inference optimization, quantization, vLLM
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
302
posts in
22.3
ms
froggeric/Qwen3.6-27B-MTP-GGUF
Â
âš¡
Quantization
huggingface.co
·
3d
·
DEV
Nitsum:
Serving
Tiered
LLM
Requests with Adaptive
Tensor
Parallelism
Â
🎯
LLM Finetuning
mlsys.wuklab.io
·
2d
·
Hacker News
DFlash: The Trick That Makes LLMs Stop Crawling One Token at a Time
Â
🎯
LLM Finetuning
abvcreative.medium.com
·
5d
Blazing fast on-device GenAI with LiteRT-LM
Â
🔬
Small LMs
developers.googleblog.com
·
1d
Ollama on Mac: Setup and
Optimization
Guide (2026)
Â
🎯
LLM Finetuning
insiderllm.com
·
4d
Will TurboQuant save us from the RAM apocalypse?
Â
âš¡
Quantization
blopig.com
·
6d
Meta's WhatsApp Incognito Chat puts AI conversations in a black box
Â
💻
Local AI
ppc.land
·
3d
ImpactArbiter – A PyTorch autograd trap for
LLM
memory bugs
Â
🎯
LLM Finetuning
github.com
·
2d
·
Hacker News
An
LLM
on a Sony PSP
Â
ðŸ§
LLMs
granda.org
·
5d
SpecSA: Bridging
Speculative
Decoding
and Sparse Attention for Efficient
LLM
Inference
Â
ðŸ§
LLMs
arxiv.org
·
1d
Context pruning: cut
LLM
tokens without losing quality (9 minute read)
Â
🎯
LLM Finetuning
redis.io
·
3d
The Best Open Source and Open-Weight
LLM
Models
to Run Locally in 2026
Â
💻
Local AI
huggingface.co
·
2d
not much happened today
Â
🤖
AI Agents
news.smol.ai
·
5d
A cheap fix that saves the AI $400M dollars a year and brings 4B people online
Â
âš¡
Quantization
codecai.net
·
3d
·
Hacker News
Why Vision LLMs Force A Rethink Of Edge AI Hardware
Â
🎯
LLM Finetuning
semiengineering.com
·
6d
Find bugs in YOUR code using OpenCode,
Llama.cpp
and Qwen3.6
Â
💻
Local AI
wtarreau.blogspot.com
·
3d
·
Lobsters
,
Hacker News
,
wtarreau.blogspot.com
Lever:
Speculative
LLM
Inference
on Smartphones
Â
ðŸ§
LLMs
arxiv.org
·
2d
Maker packs an opinionated, googly-eyed AI chatbot into a mobile suitcase, powered by an Nvidia Jetson — entirely local machine entity runs Gemma 4 E4B and can respond in 200ms
Â
🔓
Open Source AI
tomshardware.com
·
3d
michelangeloromerochisco/ternative:
Inference
engine
for ternary-weight LLMs with runtime LoRA - the
llama.cpp
of BitNet models
Â
💻
Local AI
github.com
·
1d
·
Hacker News
KVDrive: A Holistic Multi-Tier
KV
Cache
Management System for Long-Context
LLM
Inference
Â
💻
Local AI
arxiv.org
·
2d
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help