VLLM Predicted Outputs
cascadetech.ai·17w·
Discuss: Hacker News

Have you ever asked an AI agent to make a simple change to a large piece of code, only to find yourself sitting idly by while the llm regurgitates pages and pages of code you’ve already written with just a few small changes made? Did you wonder ‘WHY does it have to regenerate all of this code token by token? Can’t it just regenerate the pieces that have changed?’

The answer is YES, with Predicted Outputs. Predicted outputs is a technique in llm generation that uses a prediction of the model’s output so the llm can skip sections it already ‘knows about’ and generate only new tokens. The prediction only needs to match partially: if a little bit matches then the generation will speed up a little bit, if a lot matches then the speedup will be dramatic.

Predicted outputs is not a common fe…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help