FP8 GEMM Optimization on AMD CDNA4 Architecture (opens in new tab)
Learn how to build high-performance FP8 GEMM kernels on AMD CDNA™4 GPUs using MFMA, LDS swizzling, and double-buffering.
Read the original articleLearn how to build high-performance FP8 GEMM kernels on AMD CDNA™4 GPUs using MFMA, LDS swizzling, and double-buffering.
Read the original article