Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models
semantic-abstraction.cs.columbia.edu·4d·
Discuss: Hacker News
🎨Neural Rendering
Preview
Report Post

Conference on Robot Learning 2022

Our approach, Semantic Abstraction, unlocks 2D VLM’s capabilities to 3D scene understanding. Trained with a limited synthetic dataset, our model generalizes to unseen classes in a novel domain (i.e., real world), even for small objects like “rubiks cube”, long-tail concepts like “harry potter”, and hidden objects like the “used N95s in the garbage bin”. Unseen classes are bolded.


Paper & Code

Latest Paper Version: ArXiv, Github


Abstraction via Relevancy

Our key observation is that 2D VLMs’ relevancy activations roughly corresponds to their confidence of whether and where an object is in the sce…

Similar Posts

Loading similar posts...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help