AWS re:Invent 2025 - High-performance inference for frontier AI models (AIM226)
dev.to·2d·
Discuss: DEV
📱Edge AI
Preview
Report Post

🦄 Making great presentations more accessible. This project aims to enhances multilingual accessibility and discoverability while maintaining the integrity of original content. Detailed transcriptions and keyframes preserve the nuances and technical insights that make each session compelling.

Overview

📖 AWS re:Invent 2025 - High-performance inference for frontier AI models (AIM226)

In this video, Philip Kiely from Baseten discusses high-performance inference for frontier AI models, introducing the concept of inference engineering. He explains how Baseten optimizes runtime performance through techniques like quantization (including NVFP4), speculative decoding (Eagle 3, look-ahead decoding), KV cache-aware routing, parallelism for mixture of experts architectures, a…

Similar Posts

Loading similar posts...