# Giving an LLM Eyes and Hands on a Mobile Simulator (opens in new tab)
The interface a human uses When a person does QA in tapflow, the loop is: Look at the simulator screen Decide what to do (tap, swipe, type) Do it Look again This is exactly the perception-action loop that vision-capable LLMs are built for. The model sees a screenshot, reasons about what it shows, decides what action to take, and calls a tool to execute it. We didn't need to build a new automation layer. We just needed to expose tapflow's existing WebSocket and REST APIs as MCP tools. What the...
Read the original article