Understanding the 3D space is a key challenge in AI. This lies at the border of robotic and agent interaction with the physical world. Humans can easily identify and relate to objects, depth, and have an inherent understanding of physics in most of the cases. This is often termed as embodied reasoning. And to understand the world like humans AI must develop this capability to be able to infer a 3D structure of the environment, identify objects, and plan actions.

Google’s Gemini models are now at the frontier of this newly learned ability. They are learning to “see” in 3D and can physically point to items and plan spatially in a manner that a human would. In this article we’ll explore how LLM’s like Gemini understand the 3D world and dive into the different viewpoints of spatial und…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help