As AI technologies advance, truly helpful agents will become capable of better anticipating user needs. For experiences on mobile devices to be truly helpful, the underlying models need to understand what the user is doing (or trying to do) when users interact with them. Once current and previous tasks are understood, the model has more context to predict potential next actions. For example, if a user previously searched for music festivals across Europe and is now looking for a flight to London, the agent could offer to find festivals in London on those specific dates.

Large multimodal LLMs are already quite good at understanding user intent from a user interface (UI) trajectory. But using LLMs for this task would typically require sending information to a server, which can be s…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help