AI has read everything on the internet, now it's watching how we live to train robots

Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust.

Connecting the dots: AI has mostly been confined to the virtual world, but it is now learning from the physical mechanics of everyday life, driven by a global surge in data collection and annotation. From manufacturing hubs in southern India to research labs around the world, data specialists are building the foundations for AI systems capable of practical, real-world tasks.

In the textile city of Karur, Naveen Kumar begins his day not by writing code, but by performing hundreds of precise hand movements to fold towels. With a GoPro camera attached to his forehead, he records every gesture, from the way his fingers grip the fabric to the sequ…

Serving tech enthusiasts for over 25 years. TechSpot means tech analysis and advice you can trust.

These recordings are not intended for social media or training manuals. Instead, they power Objectways, a data labeling company that supplies annotated training materials to robotics and generative AI clients around the world. Objectways, which has more than 2,000 employees, divides its work between annotating sensor data for autonomous vehicles and robotics and providing performance feedback for generative AI systems.

Errors are common. Kumar and his colleagues often discard hundreds of recordings due to missed steps or misplaced items. Each video is meticulously labeled, with annotation teams outlining moving parts, tagging objects, and classifying specific gestures. This meticulous work is vital: it provides machine learning models with granular context about physical actions, helping algorithms learn everything from the motion of an arm to the exact pressure needed to fold a towel without creases.

This meticulous work is vital: it provides machine learning models with granular context about physical actions, helping algorithms learn everything from the motion of an arm to the exact pressure needed to fold a towel without creases.

International firms recognize the value of this type of data. Ulrik Stig Hansen, cofounder of Encord, a platform in San Francisco that manages large annotation projects, told the Los Angeles Times that robotics is experiencing a resurgence as companies compete to develop foundation models built for physical tasks.

Nvidia believes the robotics market is about to explode, just like ChatGPT

Encord works with Objectways and similar firms to collect demonstration data for clients, including Physical Intelligence and Dyna Robotics. Major companies such as Tesla, Boston Dynamics, Nvidia, Google, and OpenAI are moving ahead, betting that well curated training sets based on human activity will push their robotic systems toward greater autonomy and flexibility.

Nvidia, for example, estimates that the global market for humanoid robots could reach 38 billion dollars in the next ten years. Alongside well known names, dozens of emerging firms provide hardware, simulation tools, and annotated data to speed up the development of multipurpose robots for mass markets.

Unlike large language models, which process vast amounts of online content to imitate speech, reasoning, and visual understanding, robotics training depends on first person demonstration data, such as the footage recorded by people like Kumar as they perform routine tasks with precision.

Collecting real world physical data has become an industry of its own. Some companies use teleoperation, where humans remotely guide robots through specific actions. According to Ali Ansari, founder of Micro1, advancements in remote data collection now allow trainers on one continent to control robots on another, with movement data streamed and analyzed for successes and mistakes.

Operators are now working in centralized “arm farms” in Eastern Europe, where warehouses are packed with joysticks and teams guiding robots for real-time training.

Operators are now working in centralized “arm farms” in Eastern Europe, where warehouses are packed with joysticks and teams guiding robots for real-time training. Mohammad Musa of Deepen AI explains that current best practices combine real and synthetic demonstrations based on human guided sessions and staged environments, with much of this activity still happening outside Western markets.

Critics question how effective these methods truly are, noting that some teleoperated robots perform well under human control but struggle to act independently in real world conditions.

Even so, demand for demonstration data continues to grow. Micro1 pays people in Brazil, Argentina, India, and the United States to wear smart glasses and record everyday movements. Figure AI, based in San Jose, has partnered with real estate company Brookfield to capture activity in one hundred thousand homes.

The project uses one billion dollars in funding mostly for collecting first person human data. Meanwhile, Scale AI, backed by Meta, has gathered more than one hundred thousand hours of similar footage for robotics training.

Objectways continues to expand its repertoire, recently documenting and annotating tasks from robotic arms manipulating boxes and T-shirts, as well as humanoid robots sorting and folding towels. The scale of annotation work is immense: employees recently processed 15,000 videos of robots performing folding tasks, adjusting for errors such as garments being tossed rather than carefully folded. “In five or ten years, robots will be able to do all these jobs,” notes Kavin, a veteran on the annotation team.

Similar Posts