Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Quantization
⚡ Quantization
Model Compression, INT8, Weight Quantization
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
57
posts in
9.1
ms
GGUF
vs
GPTQ
vs
AWQ
: The Plain-English Guide to LLM Quantization (and Which One to Pick)
💬
LLMs
vettedconsumer.com
·
4d
4 days ago
·
Hacker News
Actions for GGUF vs GPTQ vs AWQ: The Plain-English Guide to LLM Quantization (and Which One to Pick)
Understanding
Quantization-Aware
Training
: Gradients at Quantized Weights Bias to the Low-Loss Basin
🤖
AI
Content type:
Academic
arxiv.org
·
1d
1 day ago
Actions for Understanding Quantization-Aware Training: Gradients at Quantized Weights Bias to the Low-Loss Basin
Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA
🎮
Game Engines
everylocalai.com
·
1h
1 hour ago
·
DEV
Actions for Ollama 0.30 GPU Boost: Faster local Qwen inference on NVIDIA
Qwen 3.6 27B AutoRound
GGUF
, need your feedback
💬
LLMs
huggingface.co
·
1d
1 day ago
·
r/LocalLLaMA
Actions for Qwen 3.6 27B AutoRound GGUF, need your feedback
Gemma 4 QAT
models
: Optimizing model
compression
for mobile and laptop efficiency
💬
LLMs
Content type:
News
Content type:
Blog
blog.google
·
5d
5 days ago
·
Hacker News
Actions for Gemma 4 QAT models: Optimizing model compression for mobile and laptop efficiency
Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support
🎮
Game Engines
alternativeto.net
·
2d
2 days ago
Actions for Ollama 0.30 delivers faster NVIDIA GPU performance and wider hardware support
Unsloth Gemma 4 QAT
🤖
AI
unsloth.ai
·
5d
5 days ago
Actions for Unsloth Gemma 4 QAT
Remove padding and multiple D2D copies for MTP by gaugarg-nv · Pull Request #24086 ·
ggml-org/llama.cpp
🎮
Game Engines
Content type:
Code
github.com
·
3h
3 hours ago
·
r/LocalLLaMA
Actions for Remove padding and multiple D2D copies for MTP by gaugarg-nv · Pull Request #24086 · ggml-org/llama.cpp
MoQ
GGUFs
and GSQ:
Low-Bit
GGUFs
Are About to Get Much Better
🎛️
Fine-tuning
Content type:
News
Content type:
Blog
kaitchup.substack.com
·
5d
5 days ago
·
r/LocalLLaMA
Actions for MoQ GGUFs and GSQ: Low-Bit GGUFs Are About to Get Much Better
MiMo-v2.5-Pro-UltraSpeed: 1T
model
with 1000 TPS
⚡
Speculative Decoding
Content type:
Blog
mimo.xiaomi.com
·
2d
2 days ago
·
Hacker News
,
r/LocalLLaMA
Actions for MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS
local llm on laptop 780M GPU using
llama
+ gemma 4 qat
💬
LLMs
Content type:
Blog
alper.bearblog.dev
·
4d
4 days ago
Actions for local llm on laptop 780M GPU using llama + gemma 4 qat
Here's a
llama.cpp
CLI Command builder.
💬
LLMs
llamabuilding.com
·
1d
1 day ago
·
r/LocalLLaMA
Actions for Here's a llama.cpp CLI Command builder.
Optimal
Post-Training
Quantization
Scales and Where to Find Them
💬
LLMs
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Optimal Post-Training Quantization Scales and Where to Find Them
DeskDash - a free Windows tool to easily manage your
GGUF
files
✍️
Prompt Engineering
gerry7.itch.io
·
3d
3 days ago
·
r/LocalLLaMA
Actions for DeskDash - a free Windows tool to easily manage your GGUF files
Show HN: Run
Llama.cpp
In-Process from Java with Project Panama FFM
💬
LLMs
deemwar-products.github.io
·
5d
5 days ago
·
Hacker News
Actions for Show HN: Run Llama.cpp In-Process from Java with Project Panama FFM
Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
✍️
Prompt Engineering
local-llm.utop.workers.dev
·
3d
3 days ago
·
Hacker News
Actions for Running Qwen 35B MoE at 450k Context on a Single 32GB GPU
Introducing
the Third Generation of Apple’s Foundation
Models
💬
LLMs
machinelearning.apple.com
·
2d
2 days ago
·
Hacker News
,
r/apple
Actions for Introducing the Third Generation of Apple’s Foundation Models
Quality Is Not a Safety Proxy Under
Quantization
🔐
Cryptography
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Quality Is Not a Safety Proxy Under Quantization
Launch HN: General Instinct (YC P26) – Frontier
models
on edge devices
✍️
Prompt Engineering
Content type:
Discussion
news.ycombinator.com
·
5d
5 days ago
·
Hacker News
Actions for Launch HN: General Instinct (YC P26) – Frontier models on edge devices
1-bit
and 1.58
bit
LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM
💬
LLMs
smolhub.com
·
2d
2 days ago
·
r/LocalLLaMA
Actions for 1-bit and 1.58 bit LLM Benchmarking on Jetson Orin Nano Super | Bonsai LM
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help