Google LiteRT-LM Speeds Up Local Inference Up to 2.2x With Gemma 4 Multi-Token Prediction (opens in new tab)

Covers 3 stories including Blazing fast on-device GenAI with LiteRT-LM

LiteRT-LM brings native support for Gemma 4 Multi-Token Prediction (MTP) drafters, enabling up to 2.2x faster inference. The framework is expanding beyond Kotlin and C++ adding support for new Swift and a JavaScript APIs.

Read the original article