Back to article

Made an interactive explainer about speculative decoding/MTP (opens in new tab)

Covers 7 stories including GLM-5.2 (6 minute read)Discussed on r/LocalLLaMA

Covers 7 related stories

GLM-5.2 (6 minute read)

DeepSeek-V3 Technical Report

Discussed on Hacker News and Hacker News

Accelerating Gemma 4: faster inference with multi-token prediction drafters

Discussed on Hacker News

[2211.17192] Fast Inference from Transformers via Speculative Decoding

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test

Unsloth Qwen3.6

Discussed on r/LocalLLaMA

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty