LLMs Add Safety Risks To Physical AI

Humanoid robots with artificial general intelligence are some years from entering our daily life, but application-specific robotics are already here. From Amazon’s fleet of fulfillment center robots to robotic surgical systems in operating rooms, search and rescue robo-dogs, autonomous drones, and last-mile delivery robots, all the way down to the humble Roomba vacuum cleaner, physical AI systems are getting smarter and more life-like.

Advanced 3D sensors, AI cameras, voice interface, and software-defined lidar play key roles in providing physical AI with eyes, ears, and a mouth via machine learning algorithms. The next step is a large language model (LLM) that functions as the brain, interpreting sensor information and instructing the robot what to do next based on its original training data and an ongoing feedback loop of its learned experience. The goal is for more robots to be a part of daily human life, but how the systems could fail is not yet known.

“It’s too early for us to know how robot and human interactions will play out,” said Sathishkumar Balasubramanian, head of product for IC verification and EDA AI at Siemens EDA. “Most of the things we’re seeing are more of a prototype. I haven’t seen anything that is real in terms of being mass use, but what we are seeing is that people are very cautious in how they do it, how they interact, because it all comes from understanding the automated robot perspective of the object it is interacting with, be it a human or a water bottle or something else. In physical AI, the first thing to understand is how to interpret the physical surroundings and understand the different attributes of the physical surroundings. It needs to know everything, so that’s where most of the effort is happening. Second, there’s a lot more progress on the sensor side of things. How does the physical AI system sense? All these sensors and biosensors need to understand that. I would say it’s still at a very nascent stage right now, and unless something really changes in the next five or 10 years, it will be a very slow market evolution.”

Others agree that the industry is just at the start of a steep learning curve. “Both for edge AI and physical AI, we’re at the beginning of growth,” said Hezi Saar, executive director, product management, mobile, automotive, and consumer IP at Synopsys. “Cloud AI is probably not declining, but it won’t be growing at the same pace once you get the capacity to provide services to consumers using physical AI. It needs to be in our hands, or vehicles, or in robots, and it needs to be affordable and low power. And it needs to be there for us. We’re seeing growth in the demand, and more SoCs are implementing those functionalities than they used to. We haven’t reached a stable point. Even in automotive, I don’t think we’ve reached a stable point where we can say, ‘All right, we’ve got it. The car can identify camels, birds, everything in the night.’ The pathfinding is more stable, more mature, but it’s not there yet. In edge AI and physical AI, we’re just at the beginning. That’s why first movers could go and grab more market share and get hooked on the right thing. Will there be an iPhone moment? I believe so. It’s like, ‘Oh, I don’t need to type it in. I don’t need to do this. This is what I want.’”

Physical AI devices are rapidly evolving, and so are the AI models. “AI is shifting from a tool to a companion, and expectations are growing, defining consumer choices,” said Chris Berge, senior vice president and general manager of Arm’s client line of business. “This shift is being driven by major advances in large language models and agentic AI. These aren’t static models anymore. They’re dynamic systems that reason, plan, and take action on your behalf. The result is interactions that feel less like commands and more like collaboration. We have moved from AI being a parlor trick to influencing how things get done.”

The dynamic between humans and robots, or any kind of automated system or autonomous system, is what makes the physical AI sector more challenging than most. “With robotics, there may still be a human in the loop somewhere, or there might be no human operator in the loop because we’re using a robot to do the function end-to-end,” said Andrew Johnson, engineering and technology leader, systems and functional safety engineering specialist at Imagination Technologies. “It might be a hazardous environment, high temperature, high radiation, or otherwise. Stick a robot in there. But the attribute of human factors shouldn’t be overlooked. Sometimes it’s the dynamic of how a human is interfacing with a machine that is evolving because of machine learning, with the intent to make the human’s life easier.”

Just as AI models will continue to get smarter, safety engineers will need to keep upskilling. “We need to think about competency management and focus on people, processes, and development frameworks, not just the tools and technology,” said Johnson. “If people did that, we would be seeing a lot more efficiency and effectiveness in the use of AI/ML in terms of the tools for design development and verification, as well as the technology used in the product to do some kind of automated or autonomous decision making.”

In terms of safety, the central concern is for the robot to correctly identify a human and not harm them. “The lockout zones around robots are serious,” said David Garrett, vice president of technology and innovation at Synaptics, citing a 2022 incident. “A robot hand was playing chess, and the kid didn’t move his hand before the robot made its move, and it came in and basically broke his finger.”

The Internet is full of other incidents involving robots in lab, factory, and even festival settings.

Physical AI systems that use LLMs as part of their larger cognitive framework add an extra dimension to safety and security. “Rigorous testing needs to be performed not just on the hardware or other processors but also on the AI,” said Rajesh Velegalati, principal security analyst at Keysight Technologies. “We feel that not enough security review is being conducted on the AI being deployed in various fields, especially those used in robots or drones. I often observe that for the same prompt, I don’t get exactly the same output. Obviously, the AI is learning, so the response may change or be entirely different, but this should not be the case in safety-critical applications. It also needs to have longer memory retention. This would mean implementing a strong fallback option in case of failure. There also should be rule enforcement designed for when the AI is interfacing with hardware parts during runtime, to act as a guardrail.”

Verifying physical AI hardware and software While much of the focus is all the software-enabled features, none of this works without the underlying semiconductor hardware. Chips in physical AI may be used in a variety of ambient temperatures and conditions, and they may be heavily utilized at times and dormant at others. This increases the challenges in designing those chips, and especially in verifying them.

Verification engineers should be using every tool in their arsenal — simulation, emulation, formal verification, and even digital twins. “Physical AI systems are so complex that you need to leverage things like formal verification to ensure that things like arbiters are working correctly at the very lowest level,” said Matthew Graham, senior group director, verification software product management at Cadence. “You need to leverage hardware-software co-verification, as well. If you’re not using all of the various techniques, you’re probably missing out.”

In the past, engineers could pick and choose the verification technologies they utilized. “You could say, ‘We’re not really bothering with formal, and we don’t really need to do too much power-aware, because we’re not really in that area,’” said Graham. “The reality is that every microchip needs to be aware from a power perspective. It may not be a low-power device, but they certainly want to optimize the power that they’re consuming because power is very much finite — especially now in the time of AI. Every device needs to ensure that at a low level, it’s functionally correct on some of its very fundamental things.”

Certain types of physical AI in sectors such as mil/aero, will require extra safety and security measures. “In semiconductors, we always consider it a very statistical analysis problem,” said Graham. “We consider things like manufacturing defect rates. Or if I send a signal along a path, what is the probability that signal is received at the other end with the same integrity that it was sent? Do I need a double check of that? Do I need a triple check? Also, what’s the level of statistical tolerance? For something as simple as a telephone call, there might be a certain level of reliability. But for something like robot soldiers, the reliability needs to be significantly higher for particular scenarios.”

The same concerns apply to the deep learning algorithms used in physical AI systems. “If an LLM is inside a robot, it’s because it’s supposed to reason about the world,” said Mike Borza, principal security technologist at Synopsys. “It understands the world in general by having sensors to experience the world, and then actuators that can modify the world in which it operates. The LLM serves a purpose within that robot. The safety concerns are fundamentally about the robot acting out of control or in a way that threatens people or other things around it.”

How an LLM is trained has a significant impact on safety. “Today’s LLMs are trained first against a very large data set, then refined and reinforced with human interaction,” said Scott Best, senior technical director, Security IP at Rambus. “These systems can then be further trained with specialized data to become a reasoning model suitable for ‘chain of thought’ agentic applications, creating its own stimulus in response to measured outcomes in a closed-loop manner.”

Engineers, therefore, need to act as guardrails for both the physical AI systems and the LLMs. “Humans can take a wrong turn and can use it in the wrong way, so they need to make sure and prevent that from happening,” Synopsys’ Saar observed.

Simulation to train LLMs and physical AI Many physical AI systems, such as humanoid robots or robot dogs, are created to serve as support tools for the visually impaired and the elderly, or to assist first-responders.

“Of course, like any technology, people can exploit robots,” said Ransalu Senanayake, assistant professor in the School of Computing and Augmented Intelligence at Arizona State University, and director of the Laboratory for Learning Evaluation and Naturalization of Systems (LENS Lab). “But what we try to do is make them useful.”

The LENS Lab robotic systems initially were connected to LLMs, then moved to computer vision (CV) models and vision language models (VLMs), and now they operate on vision-language action (VLA) models. AI auditing tools are used to monitor and improve the LLMs’ decision-making process.

“All these large language models — what I call large document models — are vulnerable to hallucination,” said Senanayake. “What is important is controlled hallucination, where we know what they are thinking in new ways. We should be able to assess, and if they are bad, we should be able to discard. If they are good, we should be able to move forward with them.”

These LLMs were trained on standard models, and the models were then adapted to their scenarios. “We do simulation,” said Senanayake. “We have models of the humanoid, we have the models of the robot dog, and we can have thousands of dogs or thousands of humans at the same time and train them in parallel on GPUs. Once you find good data for the model, we can transfer it to the real one.”

Fig. 1: ASU’s Ransalu Senanayake with the humanoid robot and robotic dog developed in the LENS Lab. Source: ASU

Likewise, NVIDIA uses simulation to train its Groot N1 open, customizable, cross- embodiment robot foundation model. “Unlike LLM models that can be trained on all human knowledge available on the internet, no such data exists for training physical AI models. Real-world data is costly and potentially dangerous to capture, and pre-training only goes so far to train models like root,” said Rev Lebaredian, vice president of Omniverse and simulation technology at NVIDIA. “We need a scalable and cost-effective way to generate large, diverse and physically accurate data.”

To that end, the latest version of Groot, N1.6, will use the company’s Cosmos Reason World Foundation Model (WFM) as a brain, allowing humanoids to break down complex instructions and execute tasks using prior knowledge and common sense. Meanwhile, new versions of its Cosmos Transfer and Cosmos Predict WFMs will let humanoids move and handle objects with more torso and arm freedom. “These models enable the generation of hundreds of virtual sensor-rich environments for robot training, reducing reliance on real-world data collection,” said Lebaredian. The company also announced the beta release of Newton, an open-source, GPU-accelerated physics engine to simulate extremely complex robot actions.

Simulation plays a key role in ensuring a robot is functionally safe before it is let loose on the public. “That’s why it’s so critical that we build physically accurate simulators that match the world as closely as possible, so that we can test these robots and fleets of robots for millions and millions of hours of operation in diverse environments before we actually put them out in the physical world and give them a physical body,” Lebaredian observed.

Simulation is particularly useful for training physics-based or physics-informed AIs, which need to navigate a factory floor, for example. “Physics is good for creating synthetic data, because physics is physics,” said Marc Swinnen, director of product marketing at Ansys, now part of Synopsys. “It’s different than addresses of people or incomes or something which is not physics-based, and you really need real data to train the AI. We can simulate physics no matter what, so physics-based AIs have that advantage. We can train AI models and provide them to the customer without using any of their data or anybody else’s data. Heat flow through silicon is heat flow through silicon – it doesn’t matter who the customer is. That makes it easier.”

Safety measures for physical AI with LLMs ASU’s LENs lab has developed several safety measures to ensure both the safety of the physical AI interacting with a human and the connected AI model. “It’s not one thing that we can apply because these neural networks are really black box models, and we don’t understand them like traditional models,” said Senanayake. “You can’t go inside them and inspect.”

Measures to ensure the safety of an LLM-guided physical AI include:

Creating red teams and failure discovery: Identifying vulnerabilities, adversarial risks, and failure modes before deployment; developing techniques for stress-testing AI models, uncovering weaknesses, and auditing behavior to improve robustness and reliability.
Reasoning and interpretability: This involves explainable AI applied to physical intelligence, providing clear and actionable explanations for AI decisions during deployment; focus on explainability, causal reasoning, and uncertainty quantification to enhance trust and transparency in AI systems.
Fixing and improving AI systems: Mitigating failures, adapting to distribution shifts, and enhancing robustness post-deployment; exploring adaptive learning, reinforcement-based corrections, and real-world fine-tuning to make AI systems more resilient and reliable.

Fig. 2. Ensuring reliability, trustworthiness, and interpretability of physical AI. Source: ASU’s LENS Lab

Unexpected failures can occur due to hazardous conditions for a robot, such as a self-driving vehicle in foggy weather, or due to an LLM’s complex decision space around moral biases and ambiguity. “We showed over 50 different types of biases,” said Senanayake. “It could be about impairments, certain political views, or educational background. We can’t enumerate everything and test it, because it’s a million gazillion different combinations. That’s why we use deep reinforcement learning to try a bunch of different things much more smartly.”

Another cause of failure is a domain shift. “This is when the environment changes over time,” said Senanayake. “Typically, humans are good at adjusting. But if the models that we develop are over-fitted to data, we see they will perform very badly when something new comes along, like a self-driving car seeing traffic cones. You put something in the middle of the traffic cone, and then it goes crazy. That’s a temporary domain shift. Domain shifts can happen slowly at different paces. They can be cyclic. They can be temporary. They can be forever.” Similarly, the LLM may struggle if it detects something out of distribution, or different from the data set it was trained on, he noted.

**Fig. 3: A robotic dog meets a real one. Source: ASU **

Conclusion Automation and various forms of robotics are already entrenched in some industries, but this is just the beginning of seeing physical AI systems in everyday life. The pinnacle will be when humanoid robots and lifelike animals have an approachable appearance, can move easily on most terrain, and are connected to highly customized AI models.

Safety frameworks are critical as physical AI complexity is increasing. “We’re dealing with big data, a complex suite of input data from various sources, complex models, complex algorithms, and the tools are inherently complex,” said Johnson. “Coupled to that, the general software stacks are complex, and you need computationally intensive hardware to deal with the real-time processing of that complexity.”

Activity is growing across the industry. Changes are coming, and over the next few years robots will be introduced on a much larger scale, pushing them from novelty to everyday encounters. The challenge now is to ensure that integration of new technology goes as smoothly as possible.

Related Reading Physical AI Chip Sales Won’t Rival GenAI Anytime Soon Robot exuberance is premature. Application-specific machines are the near future, with humanoids after 2035. Physical Access Control Raises New Security Concerns Small language models, longer device lifetimes, and thermal manipulation make securing hardware much more challenging.

Similar Posts