From hand-tuned to generated: A reproducible Triton GPU kernel benchmark across different vendors (opens in new tab)

A reproducible benchmark comparing hand-tuned, TorchInductor, Helion, and LLM-generated GPU kernels for LLM speed. Discover the best-performing, most portable backend.