Re-quantizing a local LLM 14x faster by skipping the tensors that didn't change (opens in new tab)

Where a 2-bit model spends its bits, and why trying answers used to cost eighty minutes