Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Quantization
⚡ Quantization
Model Compression, INT8, Weight Quantization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
52
posts in
89.1
ms
KV cache
quantization
: what
FP8/INT8
K and V actually buy you, and where they break
⚡
Inference
Content type:
Blog
dev.to
·
4d
4 days ago
·
DEV
Actions for KV cache quantization: what FP8/INT8 K and V actually buy you, and where they break
MiMo-v2.5-Pro-UltraSpeed: 1T
model
with 1000 TPS
⚡
Inference
Content type:
Blog
mimo.xiaomi.com
·
2d
2 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS
HNSW vs LSH: How Elasticsearch hits 0.99 recall@10 at 15,000 QPS — and what it costs
🏢
Architecture
Content type:
Blog
elastic.co
·
1d
1 day ago
Actions for HNSW vs LSH: How Elasticsearch hits 0.99 recall@10 at 15,000 QPS — and what it costs
Gemma 4 QAT
models
: Optimizing model
compression
for mobile and laptop efficiency
🔓
Open Source AI
Content type:
News
Content type:
Blog
blog.google
·
4d
4 days ago
·
Hacker News
Actions for Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
I switched from LM Studio to
llama.cpp
, and I'm never going back to a bloated wrapper
🖥️
Local AI
howtogeek.com
·
1d
1 day ago
Actions for I switched from LM Studio to llama.cpp, and I'm never going back to a bloated wrapper
alexziskind1/model-shelf
:
Model
Shelf is a local-first
model
resolver that helps AI agents and scripts find
model
weights
on your own storage before downloading from Hugging Face. Point it at an
internal
SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.
🖥️
Local AI
Content type:
Code
github.com
·
5d
5 days ago
Actions for alexziskind1/model-shelf: Model Shelf is a local-first model resolver that helps AI agents and scripts find model weights on your own storage before downloading from Hugging Face. Point it at an internal SSD, NAS, external SSD, or Thunderbolt DAS, and it returns the best local path for GGUF, MLX, safetensors, Ollama, vLLM, and other local AI workflows.
ComfyUI NVFP4 in 2026: 3 Faster Image Generation on RTX 50-Series (and the Right Format for RTX 40-Series)
🟩
Nvidia
Content type:
Blog
dev.to
·
1h
1 hour ago
·
DEV
Actions for ComfyUI NVFP4 in 2026: 3 Faster Image Generation on RTX 50-Series (and the Right Format for RTX 40-Series)
The latest Gemma 4
models
use a training trick to slash their on-device memory footprint
🔓
Open Source AI
androidauthority.com
·
4d
4 days ago
Actions for The latest Gemma 4 models use a training trick to slash their on-device memory footprint
I built a fully local AI coding assistant in Windows with Ollama and VS Code
🧠
LLM Tooling
howtogeek.com
·
1d
1 day ago
Actions for I built a fully local AI coding assistant in Windows with Ollama and VS Code
DeepSeek V4, LeCun's Bet Against LLMs, and Lovable's Self-Improving Agent - The Tokenizer Edition #30
🤖
AI
newsletter.artofsaience.com
·
5d
5 days ago
Actions for DeepSeek V4, LeCun's Bet Against LLMs, and Lovable's Self-Improving Agent - The Tokenizer Edition #30
Doubling Qwen3.6-27B on One RTX 3090: ollama
llama.cpp
+ MTP, Lever by Lever (35.7 80.2 tok/s)
🖥️
Local AI
Content type:
Blog
dev.to
·
22h
22 hours ago
·
DEV
Actions for Doubling Qwen3.6-27B on One RTX 3090: ollama llama.cpp + MTP, Lever by Lever (35.7 80.2 tok/s)
not much happened today | AINews
🔓
Open Source AI
news.smol.ai
·
5d
5 days ago
Actions for not much happened today | AINews
How to Run an
LLM
Locally on Your Mobile Phone with QVAC and Expo
🤖
GenAI
freecodecamp.org
·
6d
6 days ago
Actions for How to Run an LLM Locally on Your Mobile Phone with QVAC and Expo
Speculators v0.5.0: DFlash support and online training
⚡
Inference
developers.redhat.com
·
6d
6 days ago
Actions for Speculators v0.5.0: DFlash support and online training
Open-LLM-VTuber
Review: Offline AI Companion with Live2D
🧠
LLM
Content type:
Blog
dev.to
·
1d
1 day ago
·
DEV
Actions for Open-LLM-VTuber Review: Offline AI Companion with Live2D
Google Gemma 4 12B: Architecture, Benchmarks, Access, and Hands-on Guide for Developers
🔓
Open Source AI
Content type:
Blog
analyticsvidhya.com
·
4d
4 days ago
Actions for Google Gemma 4 12B: Architecture, Benchmarks, Access, and Hands-on Guide for Developers
Launch HN: General Instinct (YC P26) – Frontier
models
on edge devices
🚀
Frontier AI
Content type:
Discussion
news.ycombinator.com
·
4d
4 days ago
·
Hacker News
Actions for Launch HN: General Instinct (YC P26) – Frontier models on edge devices
How to Tune
llama.cpp
--n-gpu-layers: A Practical VRAM Guide (2026)
🖥️
Local AI
Content type:
Blog
dev.to
·
17h
17 hours ago
·
DEV
Actions for How to Tune llama.cpp --n-gpu-layers: A Practical VRAM Guide (2026)
Intel
's mysterious new datacenter GPU is what Nvidia's Rubin CPX nearly was
🟩
Nvidia
theregister.com
·
5d
5 days ago
Actions for Intel's mysterious new datacenter GPU is what Nvidia's Rubin CPX nearly was
3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1
⚡
Inference
Content type:
Blog
databricks.com
·
5d
5 days ago
Actions for 3x Faster Search: Parallel Test-Time Scaling with Instructed-Retriever-1
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help