Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
๐ Model Quantization
Specific
INT8, Post-Training, QAT, Pruning, Model Compression
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
132
posts in
6.2
ms
needle/docs/simple_attention_networks.md at main
ย
๐๏ธ
Attention Optimization
github.com
ยท
5d
AMD Is Bringing Improved FSR 4 Upscaling To Its Older GPUs
ย
๐ง
PTX
hardware.slashdot.org
ยท
5d
Quantized
Machine Learning
Models
for Medical Imaging in Low-Resource Healthcare Settings
ย
๐๏ธ
TensorRT
arxiv.org
ยท
1d
Why Gemma-4 26B MoE works in HuggingFace but breaks in prod inference engines
ย
๐
ONNX
github.com
ยท
5d
ยท
Hacker News
FTerViT: Fully Ternary Vision Transformer
ย
๐๏ธ
Attention Optimization
arxiv.org
ยท
12h
Theory-optimal
Quantization
Based on Flatness
ย
๐๏ธ
TensorRT
arxiv.org
ยท
1d
xxxn3m3s1sxxx/ATLAS-TQ1_0: TQ1.0 ternary inference engine for BitNet b1.58 on CPU. Pack + run Falcon3-1B/3B/7B/10B, no GPU needed.
ย
โ๏ธ
CUTLASS
github.com
ยท
3d
ยท
Hacker News
MegaTrain Full Precision
Training
of 100B+ Parameter LLMs on a Single GPU
ย
๐๏ธ
TensorRT
github.com
ยท
4d
ยท
Hacker News
K-Quantization
and its Impact on Output Performance
ย
๐๏ธ
TensorRT
arxiv.org
ยท
1d
GAMMA: Global
Bit
Allocation for Mixed-Precision
Models
under Arbitrary Budgets
ย
๐๏ธ
TensorRT
arxiv.org
ยท
2d
SAFE-SVD:
Sensitivity-Aware
Fidelity-Enforcing SVD for Physics Foundation
Models
ย
๐๏ธ
TensorRT
arxiv.org
ยท
2d
TORQ: Two-Level Orthogonal Rotation for MXFP4
Quantization
ย
๐ฏ
Tensor Cores
arxiv.org
ยท
1d
StatQAT:
Statistical
Quantizer
Optimization for Deep Networks
ย
๐๏ธ
TensorRT
arxiv.org
ยท
2d
Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis
ย
๐
Gradient Accumulation
arxiv.org
ยท
3d
Cross-Paradigm Knowledge Distillation: A Comprehensive Study of Bidirectional Transfer Between Random Forests and Deep Neural Networks for Big Data Applications
ย
๐
Model Distillation
arxiv.org
ยท
1d
Robust Basis Spline Decoupling for the
Compression
of Transformer
Models
ย
๐
Model Distillation
arxiv.org
ยท
1d
A
Hardware-Aware
, Per-Layer Methodology for
Post-Training
Quantization of Large Language Models
ย
๐
Gradient Accumulation
arxiv.org
ยท
6d
Multi-Scale Dequant: Eliminating Dequantization Bottleneck via Activation Decomposition for Efficient LLM Inference
ย
๐ฏ
Tensor Cores
arxiv.org
ยท
6d
Not All Tasks
Quantize
Equally: Fisher-Guided Quantization for Visual Geometry Transformer
ย
๐๏ธ
TensorRT
arxiv.org
ยท
3d
Forgetting That Sticks:
Quantization-Permanent
Unlearning via Circuit Attribution
ย
๐
Gradient Accumulation
arxiv.org
ยท
6d
Sign up or log in to see more results
Sign Up
Login
« Page 2
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help