DSpark: Speculative decoding accelerates LLM inference [pdf] (opens in new tab)
DeepSeek continues to not only push the boundaries but also publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately. Chinese labs are doing the most interesting work in AI right now.
Read the original article