Limited by Power

AI is seen as a massive computation problem, but that is not the case, at least with the way that the problem is structured today. It is a data movement problem. This not only limits performance but represents most of the energy consumption.

In addition, the industry spends most of its time and effort making small improvements that optimize aspects of the existing architecture, when what is really required are major changes to the way the problem is addressed. That is not in the cards today.

The industry is starting to face new problems, such as the availability of power, because consumption is vastly outstripping the rate at which new power can be brought online. “The sheer computational demands of modern applications, from generative AI like GPT-5 to hyperscale cloud services requiring tens to hundreds of thousands of GPUs, have pushed the industry into a critical power crisis,” says Noam Brousard, VP solutions engineering at proteanTecs. “Global data-center electricity consumption is climbing sharply and projected to exceed 1,000 terawatt-hours per year by 2030.”

Additional power is getting scarce. “If you want to get a megawatt of extra power in a week’s time – that is extremely challenging,” says Badarinath Kommandur, fellow at Cadence. “Then there are the operational costs, which are defined by the power utilization of the systems. Hyper-scalers have become extremely focused on optimizing the power efficiency of their systems, all the way from the chips to the system level. That feeds into their capex investment, the opex cost and their profitability, so it’s becoming foremost in the minds of everybody working in this industry.”

AI is also seen as something new, when in fact most of it is built upon twenty-year-old technology. “We see a lot of power wasted using classical compute architectures,” says Andy Heinig, department head for efficient electronics at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “To get new architectures, you need basic research from the universities. New architectures are only possible with a new generation of engineers. They have a totally new mindset and are trained from the beginning in those new architectures. If you train them with these existing ones, you get stuck on that.”

First class citizen Traditionally, power was an afterthought. The first industry to take it seriously was mobile phones, when battery life became a selling point – oftentimes eclipsing functionality. This drove the development of many tools and techniques to reduce power consumption. Those solutions have now started to migrate into other sectors. “There are two aspects to it,” says Suhail Saif, director of product management and solutions engineering at Keysight. “One is technology readiness and the second is technology adoption. From the EDA side, from the design processes side, the technology is there. Within the mobile industry, they have the expertise, they are doing what is needed to take care of power consumption, power savings, power optimization, because battery life was king.”

For industries that had a reliable source of power, it was not cost effective to spend the time on power reduction, so long as it remained within a reasonable envelope. Today, not only is power becoming a precious resource, but power consumption ends up being a thermal problem, which is also becoming a limiter.

“If you look at client platforms, power is a first-class citizen,” says Cadence’s Kommandur. “We have to think about power from day one. If you look at what’s happening with the AI build-out, two things are extremely critical. The AI build out will not be limited by availability of compute power, but more by access to power. If you look at projections through to 2030, they predict that AI will represent about 12% of total power consumption in the US. You really need to make sure you can build out the most efficient compute, both from a performance and power perspective, given the power limited capacity you have in the industry.”

Even in those cases, it is not clear that power is as important as performance. “Power is a soft limit,” says Marc Swinnen, director of product marketing at Ansys, now part of Synopsys. “How much power can you afford? The thing is, you can afford more power dissipation as long as you’re willing to pay the price for cooling it. It’s really an economic trade off, and more power typically means more speed. Today, the computational demands and the competition within AI places a premium on performance. They want the fastest chips, and if that means spending several thousand more on heat sinks and fans, so be it. Once performance demands plateau, then people will say we can’t afford to have this much power.”

The industry has a tendency to play it safe with margins. “Semiconductor companies are paying for the same problem twice: wasted power and throttled performance,” says proteanTecs’ Brousard. “Fixed voltage guard bands were meant to provide safety, but over time they have become an energy tax baked into every chip. Guard bands assume that every worst case will happen at the same time. In reality, that almost never occurs. Yet the chip is forced to run at this inflated voltage all the time. The result is simple: the chip burns energy it does not need to use.”

It is possible that the industry is sitting on a cusp. “I want companies to start looking at power as a first-class citizen,” says Keysight’s Saif. “This used to be the conversation three years ago, but with everything that is changing, with AI chips and power envelopes being the dominant factors controlling or deciding whether you would be able to do something or not. The front runners have brought about a culture change in terms of considering power to be a primary metric of design. It’s a cultural change, it’s a change about how you are thinking about what will make or break your chip. They’ve made personnel changes. They have made processes changes.”

Some of those changes can reap quick rewards. “Extra voltage creates extra heat,” says Brousard. “Extra heat forces thermal protections to kick in. Those protections reduce frequency. The chip slows itself down to survive. The silicon is capable of more, but it cannot deliver it because it was over-volted in the first place. The safety margin that was supposed to protect performance ends up limiting it instead.”

This is a complex problem that the industry may not be ready to adequately address. “When you go to the back-end of the design process, you can optimize that,” says Kommandur. “It varies from design to design and is somewhere in the 10% to 20% range. However, the biggest leverage comes from exploring different architectures for optimally meeting the power and performance targets. Some designs, especially in the DSP data-path-centric domain, have used high-level synthesis to quickly explore different architectural configurations before they commit to a particular architecture.”

This requires a different skill set. “RTL used to be considered the most ‘shift left’ you could do,” says Saif. “But today, power planning and budgeting starts at the architecture step. I see companies that have architecture power performance teams, and they are more heavily focused on power rather than performance. It used to be the other way around. Those teams are also becoming larger than the ones focusing on back-end power optimization.”

While back-end tools and methodologies may be in place, system-level ones are not. “RTL estimates are fairly accurate,” says Kommandur. “With a little bit of tuning, they can come within 10% of implementation, but you have to spend a fair bit of time and effort to tune the RTL-based estimations to back-end implementation. The problem is that higher levels do not have all the implementation details. For example, what is your clock structure? To get this level of detailed implementation knowledge into the front-end space is difficult. What you’re trying to do in the front-end estimation is to explore different implementation options earlier on, so that you have a relative ranking of different implementation schemes. You fix the microarchitecture to meet the power targets.”

An architectural problem The energy savings required are not in the 10% range. To get larger gains you have to move up in the system, you have to move up in abstraction. “Fully instrumented hardware that can provide power-performance feedback to software developers allows learning and development of best practices,” says Steven Woo, fellow and distinguished inventor at Rambus. “This may include creating and mapping algorithms to hardware, and to better understand the tradeoffs between memory tiering, data storage and retrieval, and recomputation. Using smaller data types where possible, such as FP4, conserves bandwidth and reduces data movement power. Power delivery itself is also changing, as the industry is transitioning to higher voltages like 48V for more efficient power distribution. Converting down from 48V to semiconductor component levels like 5V and 1V, should be done near the components themselves to further reduce power losses.”

No part of the entire system should escape a keen focus on power. “If you look at data centers, you find racks where each rack consumes 600 kW,” says David Kuo, vice president for product marketing and business development at Point2 Tech. “There is a lot of effort to reduce that, but you have to start looking at it, not at the chip level, but at the rack-scale level, or even at the data-center level. It’s not about transistors anymore. A major part of the power consumption is moving data, just moving bits from one location to another, and that’s the interconnect itself.”

The interconnect and the data passing across it is a big part of the problem. “An increasing share of the power budget is consumed in the movement of data, both between processors and memory and between processors,” says Rambus’ Woo. “Computer architectures need to keep the data as close as possible to where the processing occurs. This will mean larger superchips composed of multiple reticle-sized die co-packaging together into one larger processor, with larger amounts of directly connected memory to support them. The emergence of more specialized chips and cores targeted at specific high-value functions will also improve performance and power efficiency.”

As the compute problem grows in scale, so do the racks. “The entire system needs to work together,” says Point2 Tech’s Kuo. “The interconnect portion within a rack can consume anywhere between 10% to 15% of the total data center power consumption. How much can we reduce the energy consumed by the interconnect, take that energy and give it back to the compute portion?”

Another aspect of the problem is that software and algorithms are moving so fast that hardware architectures have to remain flexible enough to handle changing workloads. “Depending on the application, we have seen that proper data-preprocessing can heavily reduce the size of an otherwise power-hungry neural network,” says Benjamin Prautsch, group manager for advanced mixed-signal automation at Fraunhofer IIS. “However, this investigation takes extra development time and might be limited to a specific application. Thus, the cost of power will often be externalized to the user instead of being considered in the design phase. Only when seen as a benefit to the user, and included as part of the product requirements, will the extra time and effort be spent on design to reduce power.”

Flexible architectures go against the trend for domain specific computing which can focus on power reduction for precisely defined workloads. “The requirement from our customers is to define IP for the target performance, within a very efficient power and cost envelope,” says Kommandur. “When we design both the hard IPs and soft IPs, and now going into the chiplet space, we have to design for power all the way from the front end of the flow to the back end of the flow. Typically, a lot of these IPs are targeted for certain applications which are associated with critical workloads. We really have to make sure we work very closely with the micro-architecture and architecture teams to get the workloads of interest. We have to optimize it all the way through the synthesis and place and route and then basically the sign-off flows to make sure they meet optimal power envelope.”

Analysis There is still room for improvement at the back end, where new sources of energy waste are being eliminated. “The frontier on that is glitch power which has long been neglected,” says Ansys’ Swinnen. “Glitch power is a significant portion of total power, but since it’s not even part of the logic, it shouldn’t even be there. There are various kinds of glitch power, and it has been tough to modify the design to reduce it. It all depends on the precise timing of the signals. It’s only recently that tools have existed that are able to analyze this early on and reduce it, identify the top contributors and fix those.”

While workloads can be used to examine power at the block level, it becomes more difficult as system size grows. “Emulation is a stop gap solution at best,” says Saif. “It does enable you to look at the working scenario of the chip, and test whether it will stay in power budget, and if it will hold its thermal limits. It does enable you to see if the power delivery network will be able to successfully deliver the power or fail in some cases. Emulation enables this, but it is so expensive and time consuming to get to the necessary RTL. We come back to this topic of needing a system level solution, without the need of emulation, where you define the inputs and outputs of the system, the expected functionality of it, and then the solution would be able to give you the realistic way of exercising that real world scenario.”

It is becoming critical to look at the workloads of interest when optimizing the design. “There is a growing need to run larger and larger vectors, typically emulation platforms,” says Kommandur. “When you look at a design with a given workload of interest, or a set of workloads with some weighting factors based on design context, you can actually drive synthesis optimization, placement optimization to drive the high activity nets closer together. This reduces the capacitor impact of the interconnects. And you go through clock tree synthesis and routing to optimize design. That’s typically how design optimization will flow.”

To bring this kind of analysis up to the system level requires new tools, models, and standards. “There needs to be an industry wide effort,” says Saif. “There are a few IEEE initiatives, one of which is UPTC, the universal power and thermal coalition. The major players are taking part in that, and bringing to light their own in-house solutions. Together, there is a discussion about how those solutions could help shape universally accepted formats that would work for most companies, most designs. It is really challenging to create something that would work for everyone, but even something that would work for most situations would be accepted by those participants. It would not only need the consensus from these major players, but it would also need investment commitment from major EDA.”

Much of the work is currently being done within the systems’ companies. “People are running blind,” adds Saif. “All the big houses have their own in-house, homegrown, custom solutions that work for their system design, for their architectures, software, micro architecture, and its interaction with hardware. They have some way of estimating power and performance at that stage. The moment you try to scale them or apply them to different problems, they fail miserably. There is a hole in the EDA space for a reliable, universally accepted solution.”

Conclusions Power has risen in importance because today, it is impacting the bottom line for hyperscalers. It is a direct cost to building data centers and running them, both in terms of the energy consumed and the cost of cooling them. It is creating problems associated with the availability of power because the generation industry cannot keep up. Power has become a highly visible cost. The industry lacks the tools, methodologies and architectures necessary to make a meaningful impact in the short term.

Editor’s note: A second part of this story will examine both the emerging technologies that may have a significant impact on reducing energy consumption, and the technologies that are holding the industry back.

Related Reading Re-Architecting AI For Power Is AI using too much power? Some people think so, and believe the easy gains in power reduction have already been made.

Similar Posts