Speculative Decoding on Mobile GPUs (opens in new tab)

Discussed on DEV

--- title: "Speculative Decoding on Mobile GPUs: Draft-Verify LLM Pipelines with Vulkan Compute" published: true description: "Build a speculative decoding pipeline on Android using Vulkan compute shaders for draft models and NNAPI for verification, with adaptive batch scheduling." tags: android, kotlin, architecture, performance canonical_url: --- ## What We Are Building In this workshop, we are going to wire up a speculative decoding pipeline that runs entirely on-device on Android. A small...

Read the original article