Back to article

DFlash and Spec V2 Decoding (14 minute read) (opens in new tab)

Covers 5 stories including Looking for a self-hosted alternative to Modal.com for running ML workloadsDiscussed on Hacker News

Covers 5 related stories

Looking for a self-hosted alternative to Modal.com for running ML workloads

Discussed on r/selfhosted

mimo.xiaomi.com·

MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 TPS

Discussed on Hacker News and r/LocalLLaMA

Accelerating Gemma 4: faster inference with multi-token prediction drafters

Discussed on Hacker News

[2211.17192] Fast Inference from Transformers via Speculative Decoding

EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test