Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Quantization
⚡ Quantization
Model Compression, INT8, Weight Quantization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
45
posts in
6.8
ms
GGUF vs
GPTQ
vs
AWQ
: The Plain-English Guide to LLM
Quantization
(and Which One to Pick)
🏷️
Named Entity Recognition
vettedconsumer.com
·
4d
4 days ago
·
Hacker News
Actions for GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)
Quality Is Not a Safety Proxy Under
Quantization
🏷️
Named Entity Recognition
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Quality Is Not a Safety Proxy Under Quantization
Qwen 3.6 27B AutoRound GGUF, need your feedback
🗜️
Compression Algorithms
huggingface.co
·
1d
1 day ago
·
r/LocalLLaMA
Actions for Qwen 3.6 27B AutoRound GGUF, need your feedback
Pruned YOLOv8 ONNX
INT8
Fails: 3 Fixes That Work
🎲
Probabilistic Inference
Content type:
Blog
Content type:
Discussion
tildalice.io
·
5d
5 days ago
Actions for Pruned YOLOv8 ONNX INT8 Fails: 3 Fixes That Work
TurboQuant in PostgreSQL
🗄
Databases
Content type:
Blog
blog.mayflower.de
·
3h
3 hours ago
Actions for TurboQuant in PostgreSQL
Holding the FP8 Quality Ceiling at
8-Bit
Weights
and Activations:
INT8
and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs
🎲
Probabilistic Inference
Content type:
Academic
arxiv.org
·
8h
8 hours ago
Actions for Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs
HNSW vs LSH: How Elasticsearch hits 0.99 recall@10 at 15,000 QPS — and what it costs
🏗
Datastructures
Content type:
Blog
elastic.co
·
2d
2 days ago
Actions for HNSW vs LSH: How Elasticsearch hits 0.99 recall@10 at 15,000 QPS — and what it costs
Apple WWDC On-Device AI Deep Dive - Google Docs
🎲
Probabilistic Inference
gist.is
·
14h
14 hours ago
·
Hacker News
Actions for Apple WWDC On-Device AI Deep Dive - Google Docs
RightNow-AI/AutoMegaKernel: An agent harness that compiles a
model
into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.
🔀
CRDTs
Content type:
Code
github.com
·
2d
2 days ago
·
Hacker News
Actions for RightNow-AI/AutoMegaKernel: An agent harness that compiles a model into one provably-correct, self-retargeting CUDA megakernel and self-tunes it past cuBLAS at batch-1 LLM decode.
MoQ GGUFs and GSQ:
Low-Bit
GGUFs Are About to Get Much Better
🏷️
Named Entity Recognition
Content type:
News
Content type:
Blog
kaitchup.substack.com
·
5d
5 days ago
·
r/LocalLLaMA
Actions for MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
TileFuse: A Fused Mixed-Precision Kernel Library for Efficient
Quantized
LLM Inference on AMD NPUs
🎲
Probabilistic Inference
Content type:
Academic
arxiv.org
·
8h
8 hours ago
Actions for TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs
What's in the Box? A Field Guide to AI
Models
🧮
Hindley-Milner
Content type:
Blog
iankduncan.com
·
2d
2 days ago
Actions for What's in the Box? A Field Guide to AI Models
AMD just reserved the right to disappoint handheld and Steam Machine gamers.
⚡
Effect Systems
Content type:
News
theverge.com
·
5d
5 days ago
Actions for AMD just reserved the right to disappoint handheld and Steam Machine gamers.
Train
Models
Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell
🎲
Probabilistic Inference
Content type:
News
Content type:
Blog
developer.nvidia.com
·
2d
2 days ago
Actions for Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell
I wired a fully offline voice loop to Ollama + LM Studio — 100% CPU, no GPU, nothing leaves your machine (Silero VAD + Parakeet STT + Supertonic TTS 3)
🗜️
Compression Algorithms
Content type:
Code
github.com
·
9h
9 hours ago
·
r/LocalLLaMA
Actions for I wired a fully offline voice loop to Ollama + LM Studio — 100% CPU, no GPU, nothing leaves your machine (Silero VAD + Parakeet STT + Supertonic TTS 3)
AMD: No Definitive Decision on FSR 4.1 Support for RDNA 3.5 APUs
🔀
CRDTs
techpowerup.com
·
6d
6 days ago
·
r/hardware
Actions for AMD: No Definitive Decision on FSR 4.1 Support for RDNA 3.5 APUs
ScaleSweep: Accurate NVFP4 Post-Training
Quantization
of LLMs via Block Scale Initialization
🎲
Probabilistic Inference
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for ScaleSweep: Accurate NVFP4 Post-Training Quantization of LLMs via Block Scale Initialization
MiMo-v2.5-Pro-UltraSpeed: 1T
model
with 1000 TPS
🔀
CRDTs
Content type:
Blog
mimo.xiaomi.com
·
3d
3 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS
A system programmer’s guide to LLM inference
🧩
Constraint Programming
Content type:
Blog
blog.xiangpeng.systems
·
3d
3 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving
🔀
CRDTs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for SpectrumKV: Per-Token Mixed-Precision KV Cache Transfer for Prefill-Decode Disaggregated LLM Serving
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help