RT by @awnihannun: Qwen3.6 4bit DWQ now up on MLX, uses custom quantization scheme (4bit MLP 8bit everything else) + DWQ for additional gains. It gets 0.0225 KL... (opens in new tab)
Qwen3.6 4bit DWQ now up on MLX, uses custom quantization scheme (4bit MLP 8bit everything else) + DWQ for additional gains. It gets 0.0225 KL w/ the base model, and matches it on PPL - versus 0.0819 for a naive 4bit quant. Adds only 0.25BPW! huggingface.co/mlx-community…
Read the original article