Qwen3.6-35B NVFP4 runs on one H100 — A100 owners are out (opens in new tab)

Discussed on DEV

NVIDIA published nvidia/Qwen3.6-35B-A3B-NVFP4 on May 28, 2026 — a post-training FP4-quantized variant of Alibaba's 35B MoE model that fits on a single H100 by cutting VRAM from ~71 GB to ~23 GB. If you're on an A100 or consumer GPU, jump to the gotchas section first — this quantization format does not run on your hardware. 71 GB → 23 GB: What Gets Quantized and What Doesn't NVFP4 quantization targets the weights and activations of linear operators inside transformer and MoE blocks specificall...

Read the original article