Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Quantization
⚡ Quantization
Model Compression, INT8, Weight Quantization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
103
posts in
5.8
ms
Quality Is Not a Safety Proxy Under
Quantization
🔐
Cryptography
Content type:
Academic
arxiv.org
·
23h
23 hours ago
Actions for Quality Is Not a Safety Proxy Under Quantization
Show HN: Run
Llama.cpp
In-Process from Java with Project Panama FFM
💬
LLMs
deemwar-products.github.io
·
5d
5 days ago
·
Hacker News
Actions for Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM
Google releases Gemma 4 QAT
models
for local AI on enterprise laptops
⚡
Hardware Acceleration
4sysops.com
·
4d
4 days ago
Actions for Google releases Gemma 4 QAT models for local AI on enterprise laptops
fix(memory): move local
llama.cpp
runtime to provider plugin · openclaw/openclaw@3137110
💬
LLMs
Content type:
Code
github.com
·
1d
1 day ago
Actions for fix(memory): move local llama.cpp runtime to provider plugin · openclaw/openclaw@3137110
Apples to Apples: MLX vs.
Llama.cpp
for Gemma 4 12B on an M1 16GB
💬
LLMs
Content type:
Blog
ziraph.com
·
5d
5 days ago
·
Hacker News
Actions for Apples to Apples: MLX vs. Llama.cpp for Gemma 4 12B on an M1 16GB
UniSVQ:
2-bit
Unified Scalar-Vector
Quantization
📊
Vector Quantization
Content type:
Academic
arxiv.org
·
23h
23 hours ago
Actions for UniSVQ: 2-bit Unified Scalar-Vector Quantization
Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
✍️
Prompt Engineering
local-llm.utop.workers.dev
·
3d
3 days ago
·
Hacker News
Actions for Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
Gemma 4 12B: A unified, encoder-free multimodal
model
💬
LLMs
Content type:
Discussion
news.ycombinator.com
·
3d
3 days ago
·
Hacker News
Actions for Gemma 4 12B: A unified, encoder-free multimodal model
A system programmer’s guide to LLM inference
🔤
Tokenization
Content type:
Blog
blog.xiangpeng.systems
·
3d
3 days ago
·
Hacker News
Actions for A system programmer’s guide to LLM inference
Ideogram4
GGUF
is out!
🎨
Generative AI
huggingface.co
·
3d
3 days ago
·
r/StableDiffusion
Actions for Ideogram4 GGUF is out!
Joint Structural
Pruning
and
Mixed-Precision
Quantization for LLM Compression
💬
LLMs
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression
2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP
💬
LLMs
Content type:
Blog
dnhkng.github.io
·
3d
3 days ago
Actions for 2x GH200 for LLM inference, Part 2: vLLM, DeepSeek V4 Flash, and MTP
BeeLlama.cpp
DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster
🎮
Game Engines
sleepingrobots.com
·
4d
4 days ago
Actions for BeeLlama.cpp DFlash on Strix Halo: 2.7x Gemma 31B, But MTP Is Still Faster
Trainable Smooth-Rotation Transforms with Learned Channel Scales for LLM
Quantization
🎛️
Fine-tuning
Content type:
Academic
arxiv.org
·
23h
23 hours ago
Actions for Trainable Smooth-Rotation Transforms with Learned Channel Scales for LLM Quantization
MiMo-v2.5-Pro-UltraSpeed: 1T
model
with 1000 TPS
⚡
Speculative Decoding
Content type:
Blog
mimo.xiaomi.com
·
3d
3 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS
LC-QAT: Data-Efficient
2-Bit
QAT for LLMs via Linear-Constrained Vector
Quantization
📊
Vector Quantization
Content type:
Academic
arxiv.org
·
23h
23 hours ago
Actions for LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization
FAIR-Calib:
Frontier-Aware
Instability-Reweighted Calibration for
Post-Training
Quantization of Diffusion Large Language Models
🎛️
Fine-tuning
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for FAIR-Calib: Frontier-Aware Instability-Reweighted Calibration for Post-Training Quantization of Diffusion Large Language Models
stable-diffusion.cpp/docs/quantization
_and_
gguf.md
at master ·
leejet/stable-diffusion.cpp
🤖
AI
Content type:
Code
github.com
·
3d
3 days ago
·
r/StableDiffusion
Actions for stable-diffusion.cpp/docs/quantization_and_gguf.md at master · leejet/stable-diffusion.cpp
On
Low-Bit
Quantization
Errors in Speaker Verification: Diagnostic and Mitigation
📊
Vector Quantization
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for On Low-Bit Quantization Errors in Speaker Verification: Diagnostic and Mitigation
google/gemma-4-12B-it-qat-q4_
0-gguf
🤖
AI
huggingface.co
·
5d
5 days ago
Actions for google/gemma-4-12B-it-qat-q4_0-gguf
« Page 1
·
Page 3 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help