Computer use in Gemini 3.5 Flash (opens in new tab)
Pretty doubtful about computer use/screenshotting based approaches.With Retriever AI, we construct custom accessibility trees to represent web pages and just switched over to using DeepSeek v4 Flash and its nearing 100x cost decrease.We also had great success just reverse engineering the underlying APIs of websites and then writing code to hit them. This approach of using screenshots to take actions on a webpage to trigger the underlying network calls the website is making seems too naive.
Read the original article