As we anticipate the first shipments of NVIDIA’s "Vera Rubin" in late summer, it has been reported that NVIDIA upgraded its superchip multiple times since March to better compete with AMD’s upcoming Instinct MI400 series of accelerators. According to SemiAnalysis, NVIDIA’s initial target for the "Vera Rubin" VR200 NVL72 system was 13 TB/s in March, which was upgraded to 20.5 TB/s by September. However, at CES 2026, NVIDIA confirmed that the VR200 NVL72 system is now operating at 22 TB/s of bandwidth. Compared to AMD’s Instinct MI455X accelerator, which has 19.6 TB/s, NVIDIA initially had inferior system bandwidth. They addressed this by using faster DRAM and improving…
As we anticipate the first shipments of NVIDIA’s "Vera Rubin" in late summer, it has been reported that NVIDIA upgraded its superchip multiple times since March to better compete with AMD’s upcoming Instinct MI400 series of accelerators. According to SemiAnalysis, NVIDIA’s initial target for the "Vera Rubin" VR200 NVL72 system was 13 TB/s in March, which was upgraded to 20.5 TB/s by September. However, at CES 2026, NVIDIA confirmed that the VR200 NVL72 system is now operating at 22 TB/s of bandwidth. Compared to AMD’s Instinct MI455X accelerator, which has 19.6 TB/s, NVIDIA initially had inferior system bandwidth. They addressed this by using faster DRAM and improving interconnects between CPUs, GPUs, and the entire system.
In November, AMD compared the MI400 lineup to NVIDIA’s upcoming "Vera Rubin" series, claiming similar compute performance and memory bandwidth but 1.5 times higher memory capacity and scale-out bandwidth. AMD claims it will deliver up to 40 FP4 and 20 FP8 PFLOPs, roughly twice the compute performance of the current MI350. The GPUs also transition to HBM4 memory from HBM3e, increasing capacity from 288 GB to 432 GB and raising total bandwidth from 8 TB/s to 19.6 TB/s. Each GPU provides 300 GB/s of scale-out bandwidth and adds broader AI data format support along with expanded AI pipelines. Two models are planned: the Instinct MI455X for large-scale AI training and inference, and the MI430X for HPC and AI.
For the VR200, NVIDIA has set performance targets of about 50 PetaFLOPS of FP4 compute per Rubin GPU, resulting in roughly 100 PetaFLOPS FP4 for the two-GPU Superchip. Each Rubin GPU integrates two reticle-sized compute chiplets paired with eight HBM4 stacks, providing approximately 288 GB of HBM4 per GPU and about 576 GB of HBM4 for the full Superchip. However, since these GPUs are sold as entire systems, it will be interesting to see how AMD and NVIDIA solutions perform and what choices hyperscalers make.