Achieve state-of-the-art inference latencies with speculative decoding (opens in new tab)
How Modal and Decagon worked together to cut inference latency - and you can too.
Read the original articleHow Modal and Decagon worked together to cut inference latency - and you can too.
Read the original article