Multimodal Function Calling with Gemini 3 and Interactions API (opens in new tab)

Multimodal function calling allows tools to return images the model can process natively, similar to how you pass images in prompts. Instead of describing what's in a file, your tool returns the actual image and Gemini 3 processes it natively.