GPU Kernels, Python DSL, CUDA Alternative, Performance Optimization