Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Close
Copied to clipboard
Close
Unable to share or copy to clipboard
Close
๐ Model Quantization
Specific
INT8, Post-Training, QAT, Pruning, Model Compression
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
146538
posts in
10.1
ms
Efficient
Quantization
of Mixture-of-Experts with
Theoretical
Generalization Guarantees
ย
๐๏ธ
TensorRT
arxiv.org
ยท
15h
Attn-QAT
: Making 4-Bit Attention Actually Work
ย
โก
Flash Attention
haoailab.com
ยท
1d
MF-QAT
: Multi-Format Quantization-Aware Training for Elastic Inference
ย
๐๏ธ
TensorRT
arxiv.org
ยท
6d
MoBiE
: Efficient Inference of Mixture of Binary Experts under Post-Training
Quantization
ย
โก
ONNX Runtime
arxiv.org
ยท
15h
RUQuant
: Towards
Refining
Uniform Quantization for Large Language Models
ย
๐๏ธ
TensorRT
arxiv.org
ยท
2d
QAPruner
: Quantization-Aware Vision Token
Pruning
for Multimodal Large Language Models
ย
๐๏ธ
TensorRT
arxiv.org
ยท
3d
TC-AE
: Unlocking Token Capacity for Deep Compression Autoencoders
ย
๐๏ธ
TensorRT
arxiv.org
ยท
15h
Zero-Shot
Quantization
via Weight-Space
Arithmetic
ย
๐๏ธ
TensorRT
arxiv.org
ยท
2d
A
mathematical
framework for parameter recovery in large language models via a joint
Euclidean
mirror
ย
๐
Model Distillation
arxiv.org
ยท
15h
Neural Network
Pruning
via
QUBO
Optimization
ย
๐
Kernel Fusion
arxiv.org
ยท
1d
Collaborative
Multi-Mode
Pruning
for Vision-Language Models
ย
๐
Model Distillation
arxiv.org
ยท
3d
Rethinking Generalization in Reasoning
SFT
: A Conditional Analysis on Optimization, Data, and Model
Capability
ย
๐
Model Distillation
arxiv.org
ยท
15h
MUXQ
: Mixed-to-Uniform Precision MatriX Quantization via Low-Rank
Outlier
Decomposition
ย
๐ฏ
Tensor Cores
arxiv.org
ยท
2d
Beyond Loss Values: Robust Dynamic
Pruning
via Loss
Trajectory
Alignment
ย
๐
Gradient Accumulation
arxiv.org
ยท
15h
SoLA
: Leveraging Soft Activation
Sparsity
and Low-Rank Decomposition for Large Language Model Compression
ย
๐
Gradient Accumulation
arxiv.org
ยท
2d
Fast
NF4
Dequantization
Kernels for Large Language Model Inference
ย
๐งฉ
Attention Kernels
arxiv.org
ยท
3d
DINO-QPM
:
Adapting
Visual Foundation Models for Globally Interpretable Image Classification
ย
๐งฎ
cuDNN
arxiv.org
ยท
15h
Choosing the Right
Regularizer
for Applied ML: Simulation Benchmarks of Popular
Scikit-learn
Regularization Frameworks
ย
๐๏ธ
TensorRT
arxiv.org
ยท
2d
Sparse Bayesian Learning Algorithms
Revisited
: From Learning
Majorizers
to Structured Algorithmic Learning using Neural Networks
ย
๐
Model Distillation
arxiv.org
ยท
3d
R\'
enyi
Attention Entropy for Patch
Pruning
ย
๐งฉ
Attention Kernels
arxiv.org
ยท
2d
Loading...
Loading more...
Page 2 »
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help