Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change (opens in new tab)
Where a 2-bit model spends its bits, and why trying answers used to cost eighty minutes
Read the original articleWhere a 2-bit model spends its bits, and why trying answers used to cost eighty minutes
Read the original article