We built a CUDA emulator that profiles GPU code with zero hardware
rightnowai.coΒ·4hΒ·
Discuss: Hacker News

We needed to test our CUDA kernels on 15 different GPUs. The problem? Renting all of them costs $3,000 a month. Just for testing.

That’s when we thought: what if we could predict how a kernel runs on any GPU without actually owning it?

Not some rough guess. Real numbers. Like, your kernel takes 2.4ms on an RTX 4090 and 5.1ms on a V100. Within 1% of actual hardware.

Three months later, we built it. Now developers test kernels on 50+ GPUs before writing a single line. One team saved $18,000 in cloud costs. Another found a bug on an A100 they’ve never even touched.

Here’s how it works:

The problem: Testing is expensive

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  The Multi-GPU Testing Problem                               β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€...

Similar Posts

Loading similar posts...