A Fast Attention Kernel for MI300X, Written in Hip, Not Assembly (opens in new tab)
A deep dive into a bf16 forward attention kernel for AMD MI300X, written in HIP with instruction level control rather than raw assembly.
Read the original articleA deep dive into a bf16 forward attention kernel for AMD MI300X, written in HIP with instruction level control rather than raw assembly.
Read the original article