Skip to main content
Scour
Browse
Getting Started
Login
Sign Up
You are offline. Trying to reconnect...
Copied to clipboard
Unable to share or copy to clipboard
Model optimizations in LLMs
✨ Model optimizations in LLMs
Filter Results
Timeframe
Fresh
Past Hour
Today
This Week
This Month
Feeds to Scour
Subscribed
All
Scoured
48
posts in
7.6
ms
Joint Structural
Pruning
and Mixed-Precision
Quantization
for
LLM
Compression
📊
AI Performance Profiling
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Joint Structural Pruning and Mixed-Precision Quantization for LLM Compression
Train
Models
Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell
🧠
Large Language Models (LLMs)
Content type:
News
Content type:
Blog
developer.nvidia.com
·
3d
3 days ago
Actions for Train Models Faster with JAX and MaxText Using NVFP4 on NVIDIA Blackwell
SPEAR: A System for
Post-Quantization
Error-Adaptive Recovery Enabling Efficient
Low-Bit
LLM
Serving
💬
Prompt optimizations for LLM serving
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for SPEAR: A System for Post-Quantization Error-Adaptive Recovery Enabling Efficient Low-Bit LLM Serving
TileFuse: A Fused Mixed-Precision Kernel Library for Efficient
Quantized
LLM
Inference
on AMD NPUs
🔧
Systems-level optimizations for LLM serving
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for TileFuse: A Fused Mixed-Precision Kernel Library for Efficient Quantized LLM Inference on AMD NPUs
Understanding
Quantization-Aware
Training: Gradients at Quantized Weights Bias to the Low-Loss Basin
🔢
Quantization of LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Understanding Quantization-Aware Training: Gradients at Quantized Weights Bias to the Low-Loss Basin
Value-and-Structure Alignment for Routing-Consistent
Quantization
of Mixture-of-Experts
Models
🔢
Quantization of LLMs
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Value-and-Structure Alignment for Routing-Consistent Quantization of Mixture-of-Experts Models
Quantized
Stochastic Primal-Dual Methods for Distributed
Optimization
under Relaxed Global Geometry
🔢
Quantization of LLMs
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Quantized Stochastic Primal-Dual Methods for Distributed Optimization under Relaxed Global Geometry
On
Low-Bit
Quantization
Errors in Speaker Verification: Diagnostic and Mitigation
🔢
Quantization of LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for On Low-Bit Quantization Errors in Speaker Verification: Diagnostic and Mitigation
NSVQ: Mitigating Codebook Collapse by Stabilizing Encoder Drift in Vector
Quantization
🔢
Quantization of LLMs
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for NSVQ: Mitigating Codebook Collapse by Stabilizing Encoder Drift in Vector Quantization
OffQ: Taming Structured Outliers in
LLM
Quantization
by Offsetting
🔢
Quantization of LLMs
Content type:
Academic
arxiv.org
·
3d
3 days ago
Actions for OffQ: Taming Structured Outliers in LLM Quantization by Offsetting
ASH: Asymmetric Scalar Hashing With Learned Dimensionality Reduction for High-Fidelity Vector
Quantization
🔍
Retrieval-augmented generation
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for ASH: Asymmetric Scalar Hashing With Learned Dimensionality Reduction for High-Fidelity Vector Quantization
Holding the FP8 Quality Ceiling at
8-Bit
Weights and Activations: INT8 and GGUF Post-Training
Quantization
of Ideogram 4.0 for Consumer GPUs
🔢
Quantization of LLMs
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Holding the FP8 Quality Ceiling at 8-Bit Weights and Activations: INT8 and GGUF Post-Training Quantization of Ideogram 4.0 for Consumer GPUs
Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using
LLM
Agents
🔢
Quantization of LLMs
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Decision-Aware Memory Cards: Counterfactual-Inspired Context Selection and Compression for Tool-Using LLM Agents
Minimizing the Hidden Cost of Scales: Graph-Guided
Ultra-Low-Bit
Quantization
for
Large
Language Models
🧠
Large Language Models (LLMs)
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for Minimizing the Hidden Cost of Scales: Graph-Guided Ultra-Low-Bit Quantization for Large Language Models
What Limits Does
Quantization
Place on Dense Top-$k$ Retrieval? A Theoretical Study
🔢
Quantization of LLMs
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for What Limits Does Quantization Place on Dense Top-$k$ Retrieval? A Theoretical Study
Closing the Indexing-Decoding Gap in Multimodal Generative Retrieval via Prefix Retention
Optimization
🔍
Retrieval-augmented generation
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for Closing the Indexing-Decoding Gap in Multimodal Generative Retrieval via Prefix Retention Optimization
ViP-VL
: Vietnamese Self-supervised Speech Pretraining
Model
with
Vector-Quantization
Learning
🧠
Large Language Models (LLMs)
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for ViP-VL: Vietnamese Self-supervised Speech Pretraining Model with Vector-Quantization Learning
FQA: A Full-Space
Quantization-Driven
Architecture
for Hardware-Efficient Piecewise Approximation of Nonlinear Activation Functions
🔢
Quantization of LLMs
Content type:
Academic
arxiv.org
·
6d
6 days ago
Actions for FQA: A Full-Space Quantization-Driven Architecture for Hardware-Efficient Piecewise Approximation of Nonlinear Activation Functions
APEX4: Efficient Pure W4A4
LLM
Inference
via Intra-SM Compute Rebalancing
🔧
Systems-level optimizations for LLM serving
Content type:
Academic
arxiv.org
·
2d
2 days ago
Actions for APEX4: Efficient Pure W4A4 LLM Inference via Intra-SM Compute Rebalancing
Beyond Per-Token Pricing: A Concurrency-Aware Methodology for
LLM
Infrastructure Cost Estimation
📊
AI Performance Profiling
Content type:
Academic
arxiv.org
·
18h
18 hours ago
Actions for Beyond Per-Token Pricing: A Concurrency-Aware Methodology for LLM Infrastructure Cost Estimation
Page 2 »
Log in to enable infinite scrolling
Keyboard Shortcuts
Navigation
Next / previous item
j
/
k
Open post
o
or
Enter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
g
h
Interests
g
i
Feeds
g
f
Likes
g
l
History
g
y
Changelog
g
c
Settings
g
s
Browse
g
b
Search
/
Pagination
Next page
n
Previous page
p
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc
Press
?
anytime to show this help