learning gpu programming basics till flash attention probably as first milestone
sankalp.bearblog.dev·37w
Preview
Report Post
  • 24 Mar, 2025 *

log 31 march (i fell sick last 3-4 days. continuing from yesterday.)

next_power_of_two - use this model

stride - how many steps to skip for program. can do this instead of using contigous and flattening.

mask - is useful to avoid reeading 1023 extra tensors lets say we end up with one extra block due to ceiling division so we avoid reading and writing those.

vector addition

Vector addition _ Triton GPU Kernels 101 Lesson #4 21-25 screenshot

When you launch

n_elements = output.numel()
grid = lambda meta: (triton.cdiv(n_elements, meta['BLOCK_SIZE']))

How many different kernels are gonna get launched i…

Similar Posts

Loading similar posts...