Making FlashAttention-4 faster for inference (opens in new tab)
What part of "dtype = 'fp8', num_splits = 0, pack_gqa = True, q_stage = 1, page_size = 1" do you not understand?
Read the original articleWhat part of "dtype = 'fp8', num_splits = 0, pack_gqa = True, q_stage = 1, page_size = 1" do you not understand?
Read the original article