VLLM Predicted Outputs
cascadetech.ai·8h·
Discuss: Hacker News

Have you ever asked an AI agent to make a simple change to a large piece of code, only to find yourself sitting idly by while the llm regurgitates pages and pages of code you’ve already written with just a few small changes made? Did you wonder ‘WHY does it have to regenerate all of this code token by token? Can’t it just regenerate the pieces that have changed?’

The answer is YES, with Predicted Outputs. Predicted outputs is a technique in llm generation that uses a prediction of the model’s output so the llm can skip sections it already ‘knows about’ and generate only new tokens. The prediction only needs to match partially: if a little bit matches then the generation will speed up a little bit, if a lot matches then the speedup will be dramatic.

Predicted outputs is not a common fe…

Similar Posts

Loading similar posts...