Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
📉 Model Quantization
Specific
INT8, Post-Training, QAT, Pruning, Model Compression
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
132
posts in
6.4
ms
DiRotQ:
Rotation-Aware
Quantization
for
4-bit
Diffusion Transformers
🏎️
TensorRT
arxiv.org
·
2d
GPU Memory Math for LLMs: Formula That Tells You What Fits on Your GPU
📈
GPU Occupancy
theahmadosman.substack.com
·
19h
·
Substack
,
r/LocalLLaMA
Qwen 27b MTP Config,
Llama.cpp
Single 3090
📊
Profiling Tools
github.com
·
4d
·
r/LocalLLaMA
Why Shrinking an AI
Model
Often Makes It More Useful
⚡
ONNX Runtime
siliconopera.com
·
1d
reComputer RK3576/RK3588 Edge AI computers are supported by reComputer AI Lab one-click deployment platform
⚡
ONNX Runtime
cnx-software.com
·
3h
The custom AI ASIC
state
of play (May 2026) — Broadcom deals, Google TPUs, Meta MTIA & beyond
🔧
PTX
tomshardware.com
·
4h
Need a second pair of eyes, this Qwen3.6 27B
quant
recipe consistently thinks less and is correct
⚡
ONNX Runtime
huggingface.co
·
6d
·
r/LocalLLaMA
A
compressed
sensing neuromorphic processor for sparse signal classification
📊
Gradient Accumulation
frontiersin.org
·
11h
Initial Benchmarks Of The SpacemiT K3 RVA23 RISC-V CPU With The K3 Pico-ITX
📈
Occupancy Optimization
phoronix.com
·
1d
·
Hacker News
Qwen’s MTP test puts local AI back in startup math
⚡
ONNX Runtime
startupfortune.com
·
6d
Command A+: Making sovereign agentic capabilities available to all
🤖
AI Coding Tools
cohere.com
·
1d
·
Hacker News
GRIP-VLM: RL for Efficient Vision-Language
Models
📊
Gradient Accumulation
startuphub.ai
·
6d
michelangeloromerochisco/ternative: Inference engine for
ternary-weight
LLMs with runtime LoRA - the
llama.cpp
of BitNet models
🔄
ONNX
github.com
·
1d
·
Hacker News
Forlinx rolls out FET3572-C SoM and OK3572-C board with Rockchip RK3572
🧠
CPU Architecture
linuxgizmos.com
·
3d
Quant.npu
: Enabling Efficient Mobile NPU Inference for on-device LLMs via Fully
Static
Quantization
🏎️
TensorRT
arxiv.org
·
12h
Inside the M4 Apple Neural Engine, Part 2: ANE Benchmarks
🎯
Tensor Cores
maderix.substack.com
·
3d
·
Substack
TFLite
Model
Conversion: 10 Commands That Actually Work
🔄
ONNX
tildalice.io
·
3d
AMD promises to bring improved, hardware-backed FSR 4 upscaling to older Radeon GPUs
🎯
GPU Kernels
arstechnica.com
·
6d
E-PMQ: Expert-Guided
Post-Merge
Quantization
with
Merged-Weight
Anchoring
🏎️
TensorRT
arxiv.org
·
2d
Quantization
From First Principles: Build Your Own
INT8
Inference Engine
🏎️
TensorRT
medium.com
·
5d
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help