🤖AI lmsys.orgBlog

DFlash and Spec V2 Decoding (14 minute read) (opens in new tab)

Covers 5 stories including Looking for a self-hosted alternative to Modal.com for running ML workloadsDiscussed on Hacker News

Using Modal and Z Lab's DFlash speculative decoding models with SGLang’s newly default Spec V2 engine, you can achieve state-of-the-art latencies for LLM inference serving. Our new, jointly-released D...

Read the original article