DFlash and Spec V2 Decoding (14 minute read) (opens in new tab)
Using Modal and Z Lab's DFlash speculative decoding models with SGLang’s newly default Spec V2 engine, you can achieve state-of-the-art latencies for LLM inference serving. Our new, jointly-released D...
Read the original article