1 Million Tokens Per Second: Qwen 3.5 27B on GKE with B200 GPUs (opens in new tab)
From 22K tok/s on 4x H100 to 1M+ on 96 B200s. Every failure included.
Read the original articleFrom 22K tok/s on 4x H100 to 1M+ on 96 B200s. Every failure included.
Read the original article