Multimodal Function Vectors for Spatial Relations
arxiv.org·6h

Title:Multimodal Function Vectors for Spatial Relations

View PDF HTML (experimental)

Abstract:Large Multimodal Models (LMMs) demonstrate impressive in-context learning abilities from limited multimodal demonstrations, yet the internal mechanisms supporting such task learning remain opaque. Building on prior work of large language models, we show that a small subset of attention heads in the vision-language model OpenFlamingo-4B is responsible for transmitting representations of spatial relations. The activations of these attention heads, termed function vectors, can be extracted and manipulated to alter an LMM’s performance on relational tasks. First, using both synthetic and real image datasets, we ap…

Similar Posts

Loading similar posts...