Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
⚡ Quantization
Model Compression, INT8, Weight Quantization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
17790
posts in
28.5
ms
Quantization — Deep Dive + Problem:
Smallest
Window
Containing
All Features
🤖
LLM Inference
dev.to
·
2d
·
DEV
·
…
Quantization
from the
ground
up
🤖
LLM Inference
simonwillison.net
·
6d
·
…
OrionsLock/SALOMI
: Research code for extreme low-bit transformer quantization and inference.
🤖
LLM Inference
github.com
·
8h
·
Hacker News
·
…
The
Internalization
of Gradients: From
Prebiotic
Chemistry to Mesa-Optimizers
🔗
Network Effects
lesswrong.com
·
2d
·
…
Use Ollama with any
GGUF
Model on
Hugging
Face Hub
🦙
Ollama
huggingface.co
·
2d
·
…
What if AI doesn’t need more
RAM
but better
math
?
🤖
LLM Inference
adlrocha.substack.com
·
4d
·
Substack
·
…
'A high-speed digital cheat sheet': Google unveils
TurboQuant
AI-compression algorithm, which it claims can
hugely
reduce LLM memory usage
🤖
LLM Inference
techradar.com
·
3d
·
…
Google’s
TurboQuant
Compression
Could Increase Demand For AI Memory
🤖
LLM Inference
oodaloop.com
·
6d
·
…
Local LLM Acceleration:
Quantization
,
TTS
, and 1M Tokens/Sec
🤖
LLM Inference
dev.to
·
6d
·
DEV
·
…
Built a surgical weight editor for local
GGUF
models, edit individual
weights
directly, no GPU, no training loop (open source)
🔍
Binary Diffing
github.com
·
3d
·
DEV
,
r/LocalLLaMA
·
…
Backpropagation
Demystified
: Neural Nets from First Principles
🧠
Deep Learning
dev.to
·
4h
·
DEV
·
…
Perplexity
,
Smoothing
, and What Words Mean
🔤
Tokenization
dev.to
·
5h
·
DEV
·
…
I
Fine-Tuned
a Security Reasoning Model That
Runs
on a 4GB Laptop (No GPU, No Cloud)
🛡️
AI Security
dev.to
·
6d
·
DEV
·
…
How
TurboQuant
Works for LLMs and Why It Uses Much Less
RAM
🤖
LLM Inference
dev.to
·
1d
·
DEV
·
…
Why Inference
Compression
Compounds
for Modular Agents
🤖
LLM Inference
dev.to
·
2d
·
DEV
·
…
Postprocessing
for quantum random number generators: entropy evaluation
andrandomness
extraction
🔐
Post-Quantum Cryptography
dev.to
·
2d
·
DEV
·
…
Hidden
Markov
Models: When
Clusters
Have Memory
🤖
LLM Inference
dev.to
·
2d
·
DEV
·
…
Google's
TurboQuant
: How They Cut LLM Memory by 6x Without Losing
Accuracy
🤖
LLM Inference
dev.to
·
5d
·
DEV
·
…
The EM Algorithm: An
Intuitive
Guide with the Coin
Toss
Example
🎲
Bayesian Inference
dev.to
·
6d
·
DEV
·
…
Building a Linear
Regression
Model from Scratch with Gradient
Descent
in Python
📈
Optimization
dev.to
·
5d
·
DEV
·
…
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help