Back to nahveririm's feed

🧠Behavioral Economics ModalBlog

Achieve state-of-the-art inference latencies with speculative decoding (opens in new tab)

Covers DFlash: Block Diffusion for Flash Speculative Decoding

How Modal and Decagon worked together to cut inference latency - and you can too.

Read the original article

Sign in to keep reading the full article.