Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Model Quantization
📉 Model Quantization
Specific
INT8, Post-Training, QAT, Pruning, Model Compression
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
49
posts in
7.2
ms
Pruned YOLOv8 ONNX
INT8
Fails: 3 Fixes That Work
🔄
ONNX
Content type:
Blog
Content type:
Discussion
tildalice.io
·
5d
5 days ago
Actions for Pruned YOLOv8 ONNX INT8 Fails: 3 Fixes That Work
Understanding
Quantization-Aware
Training
: Gradients at Quantized Weights Bias to the Low-Loss Basin
🔄
ONNX
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Understanding Quantization-Aware Training: Gradients at Quantized Weights Bias to the Low-Loss Basin
Gemma 4
QAT
on 10GB Laptop: Local AI with 6.7GB VRAM
🏎️
TensorRT
everylocalai.com
·
9h
9 hours ago
·
DEV
Actions for Gemma 4 QAT on 10GB Laptop: Local AI with 6.7GB VRAM
Linux 7.2 Preparing
Intel
Key Protection Technology "KPT" For Next-Gen
QAT
🔧
PTX
phoronix.com
·
1d
1 day ago
Actions for Linux 7.2 Preparing Intel Key Protection Technology "KPT" For Next-Gen QAT
Gemma 4
QAT
models
: Optimizing model
compression
for mobile and laptop efficiency
🔄
ONNX
Content type:
News
Content type:
Blog
blog.google
·
5d
5 days ago
·
Hacker News
Actions for Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
The latest Gemma 4
models
use a
training
trick to slash their on-device memory footprint
🔄
ONNX
androidauthority.com
·
5d
5 days ago
Actions for The latest Gemma 4 models use a training trick to slash their on-device memory footprint
Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
🔄
ONNX
Content type:
Blog
towardsai.net
·
2d
2 days ago
Actions for Google Shrank Gemma 4 by 72% and Unsloth Fixed the 4-Bit Bug Nobody Else Caught on One 4090, and 4-Bit Shouldn’t Be This Good
Unsloth Gemma 4
QAT
🔄
ONNX
unsloth.ai
·
5d
5 days ago
Actions for Unsloth Gemma 4 QAT
Google DeepMind releases Gemma 4
QAT
, but Unsloth developer Daniel Han warns naive llama.cpp conversions suffer accuracy loss
🔄
ONNX
Content type:
News
digg.com
·
5d
5 days ago
Actions for Google DeepMind releases Gemma 4 QAT, but Unsloth developer Daniel Han warns naive llama.cpp conversions suffer accuracy loss
Less-relevant results
Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!
🔄
ONNX
gizchina.com
·
1d
1 day ago
Actions for Xiaomi MiMo-V2.5-Pro Just Hit 1,000 Tokens Per Second!
Google releases Gemma 4
QAT
models
for local AI on enterprise laptops
🔄
ONNX
4sysops.com
·
4d
4 days ago
Actions for Google releases Gemma 4 QAT models for local AI on enterprise laptops
local llm on laptop 780M GPU using llama + gemma 4
qat
✂️
CUTLASS
Content type:
Blog
alper.bearblog.dev
·
4d
4 days ago
Actions for local llm on laptop 780M GPU using llama + gemma 4 qat
LC-QAT
: Data-Efficient 2-Bit
QAT
for LLMs via Linear-Constrained Vector
Quantization
🔄
ONNX
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization
google/gemma-4-12B-it-qat-q4
_0-gguf
🛠
Ml-eng
huggingface.co
·
5d
5 days ago
Actions for google/gemma-4-12B-it-qat-q4_0-gguf
GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM
Quantization
(and Which One to Pick)
🔄
ONNX
vettedconsumer.com
·
4d
4 days ago
·
Hacker News
Actions for GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)
[AINews] FrontierCode: Benchmarking for Code Quality over Slop
🛠
Ml-eng
Content type:
News
latent.space
·
1d
1 day ago
Actions for [AINews] FrontierCode: Benchmarking for Code Quality over Slop
Shrinking a Neural Network Often Makes It Smarter
🎓
Model Distillation
siliconopera.com
·
19h
19 hours ago
Actions for Shrinking a Neural Network Often Makes It Smarter
MiMo-v2.5-Pro-UltraSpeed: 1T
model
with 1000 TPS
🔄
ONNX
Content type:
Blog
mimo.xiaomi.com
·
3d
3 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS
Joint Structural
Pruning
and Mixed-Precision
Quantization
for LLM
Compression
🔄
ONNX
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression
OpenAI govt stake 🇺🇸, Google compute deal 🚀, Microsoft Scout launch 🤖
🔄
ONNX
tldr.tech
·
3d
3 days ago
Actions for OpenAI govt stake 🇺🇸, Google compute deal 🚀, Microsoft Scout launch 🤖
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help