Back to article

NVIDIA/TensorRT-LLM (opens in new tab)

Covered by 6 sources including Luca Cavallin, digitalocean.comDiscussed on Hacker News

Covered in 6 articles

Luca Cavallin·

AI Engineering for Developers

Discussed on Hacker News

digitalocean.com·

The MoE-ification of the Open Model Ecosystem, and What It Means for Your Inference Bill

Build high-performance generative AI systems with Strands Agents, NVIDIA NIM, and Amazon Bedrock AgentCore

NVIDIA Technical Blog·

Boost Inference Performance up to 15x on NVIDIA Blackwell Using DFlash Speculative Decoding

lucavallin.com·

AI Engineering for Developers | Blog

ParallelKernelBench: Frontier LLMs can't write fast multi-GPU kernels (yet)