Expert-aware quantisation: near-Q4 quality at near-Q2 size? (opens in new tab)
Profiling a MoE model to find which experts matter for a specific task, then quantising the cold ones hard. The result: near-Q4 quality at near-Q2 size for local models.
Read the original article