Heading to meetings in Silicon Valley, I often drive through Santa Clara, passing boxy buildings with few windows. They are data centers for local customers willing to pay for low latency. Data centers cluster in Santa Clara because that city’s power has been the cheapest in Silicon Valley. The San Jose Mercury News recently reported that two data centers in Santa Clara are empty, waiting for a $450 million upgrade to be completed in 2028 — three years out. Power is a limiting constraint for data centers across the USA.
Figure 1: One of dozens of data centers in Santa Clara, California (Silicon Valley)
Morgan Stanley recently projected a US power shortfall through 2028, tota…
Heading to meetings in Silicon Valley, I often drive through Santa Clara, passing boxy buildings with few windows. They are data centers for local customers willing to pay for low latency. Data centers cluster in Santa Clara because that city’s power has been the cheapest in Silicon Valley. The San Jose Mercury News recently reported that two data centers in Santa Clara are empty, waiting for a $450 million upgrade to be completed in 2028 — three years out. Power is a limiting constraint for data centers across the USA.
Figure 1: One of dozens of data centers in Santa Clara, California (Silicon Valley)
Morgan Stanley recently projected a US power shortfall through 2028, totaling 45 gigawatts (20% of total data center demand), some of which can be handled by on-site power generation. It will get worse. OpenAI alone is projecting 250 GW just for its data centers by 2033.
The good news is that semiconductor innovations exist that can probably close the gap.
GenAI demand continues to grow exponentially
GenAI demand growth to date has been stunning. And projections are for further fast growth.
- OpenAI’s ARR (annual run rate) has soared from $1 billion in 2023, to $5 billion in 2024, and now is expected to be $20 billion+ by the end of this year. OpenAI’s projections are bullish enough to enter into $1.4 trillion of capacity contracts over the coming 7 years.
- Anthropic is keeping pace with $1 billion last year and expecting to end this year with $9 billion. Anthropic recently projected $70 billion in revenue in 2028 with positive cash flow!
- Google TPU and AWS Trainium volumes have moved into millions of units.
- Nvidia is asking TSMC to ramp production to satisfy rapidly growing Blackwell demand.
The major hyperscalers continue to ramp up their CapEx plans and report huge increases in tokens processed year-over-year.
Meanwhile, adoption of GenAI is still low with <10% of companies and probably <10% of people at those companies using AI really well. That means the upside for GenAI usage is huge as others catch up (or the leaders take the market share of the laggards).
The US power grid can’t keep up
The US power grid hasn’t grown much for years. Demand increases were offset by efficiencies. But now data centers are starting to significantly increase the demand on the US power grid.
In the US, the lead time for major transmission lines or natural gas pipelines is over a decade. New power generation plants take 5+ years to approve, longer for nuclear. Solar is now the cheapest form of power, but most panels are made in China and are now subject to very large tariffs. And the places with lots of space in the USA have limited transmission capacity. (Contrast this to China with the world’s largest solar and wind installations in northwest China carried by >40 new ultra-high-voltage transmission lines to the populous East).
Data centers now are being built where grid power is available.
The next step is for data centers to generate their own power on-site or near-site. The demand for gas turbines has capacity booked through 2030. Once-unloved fuel cells are now in huge demand. Airplane engines are being converted to power generation. At some point, data centers may be built in locations where very large solar and wind fields can be sited, if water is available, or at natural gas fields that have capacity in excess of the pipelines they feed.
The challenge in acquiring power will cause data centers to shift to more power-efficient chips (even when power is available, electricity costs are a major factor that is good to reduce).
Chips that can get more GenAI throughput from scarce watts
There are several ways semiconductors can help new data centers get more throughput per watt: power electronics, more memory with less latency, larger pod sizes, more efficient accelerators.
SiC/GaN FETs increase power converter efficiency & reduce cooling requirements Power supply units (PSUs) in the data center take in AC power and convert it to DC power for the chips. Traditional PSUs are 90% to 95% efficient (5% to 10% of the AC power is lost to heat in conversion, which is a double whammy as more power-hungry cooling is required).
SiC (Silicon Carbide) MOSFETs and GaN (Gallium Nitride) FETs from Wolfspeed, STM, Infineon, On Semi, Navitas, and TI can boost PSU efficiency to 98% or more.
A data center with SiC/GaN PSUs can cut AC power requirements by ~5% to 10%.
800V HVDC replacing existing 48-54V TI, ADI, Infineon, Renesas, and others are enabling 800V HVDC (high-voltage DC) architectures for next-gen data centers. Until now, data centers have distributed DC power at 48 to 54V. Power loss in transmission is proportional to the square of the current (Ploss = I2R). Increasing the voltage from 48/54V to 800V reduces the current required by about 6%. 800V HVDC reduces power requirements by ~5%. It also means cables are thinner and lighter for the same amount of power, cutting copper use by half. (Initial NVL72 racks shipped with 54V, but are designed to shift to 800V.)
CXL-attached Memory In GPU pods now, aggregate HBM is often not enough to handle growing KV-Caches (cache memory to store earlier computations to avoid re-computation) and growing context windows (that enable the model to remember past interactions, key data, etc.).
Astera Labs recently announced that Microsoft is the first to deploy its CXL Smart Memory Controllers into Microsoft’s Azure M-Series VMs. This provides up to an additional 2TB of DDR5 memory capacity with relatively low latency. (An NVL72 GB200 rack has 13.5TB of HBM3e memory).
Figure 2: Astera Labs Leo A1000 CXL Smart Memory Add-in Card
Astera claims this boosts throughput by 40% for LLMs (not clear if this is for 1 card or multiple). Since GPUs spend a lot of time idle, increasing throughput means increased utilization of the GPUs. Getting the results 40% faster roughly means using 40% less power for the same throughput. The additional DDR5 DRAM burns incremental power, but that’s relatively small compared to the NVL72 Blackwells. And the CXL-connected DRAM means that the higher latency DRAM can be reduced in size.
Nvidia recently bought Enfabrica for $900 million for its Accelerated Compute Fabric, which creates a high-speed network to connect GPUs with large pools of DDR expansion memory.
Optimized AI Accelerators GPUs can run any kind of model and do training as well as inference.
LLMs today are primarily transformer models, and the growing majority of demand is for inference. AI accelerators that are optimized for what the main market needs can be more efficient for those applications. For example, Anthropic, in a recent press release, stated that Google’s TPUs are typically more power-efficient than Nvidia’s GPUs.
Figure 3: Google Ironwood Rack with Optical Inter-rack connection (Source: Hot Chips 2025)
**Larger pods using SoW-X (system-on-wafer, 2nd generation) **Nvidia and others have been increasing GenAI throughput/watt with more powerful AI Accelerators, more per package, and scale-up networks connecting more AI accelerators. Nvidia introduced NVL72 this year, and its roadmap grows pods to NVL144, then NVL576. GPUs are going from 1 to 2 to 4 per package. All this uses copper, which is constrained to 1 to 2 meters within a single rack.
As long as copper is the backbone of the scale-up network, a logical next step is to use an entire wafer as an interposer to squeeze more GPUs into a single rack. TSMC in 2020 introduced the first solution called InFO-SoW, which was used by Tesla’s Dojo. More recently, it introduced SoW-X, which adds HBM integration. At the IEEE ECTC (Electronic Components and Technology Conference), TSMC presented a paper on SoW-X describing integration of 16 full-reticle-sized ASICs, 80 HBM4, and 2,800 224Gb/s long-range SerDes, providing massive bandwidth. Clustering more GPUs closer together boosts performance. TSMC estimates SoW-X delivers almost 2X better power efficiency (AI throughput per watt). The number of GPUs per compute tray will double or quadruple (depending on packaging and power density constraints). The GPU 2X efficiency increase is amplified by a 2 to 4X larger pod, which will deliver very roughly a 4 to 8X improvement in throughput/watt. (The rack power is way up, but throughput is up a lot more.)
Figure 4: Nvidia CEO holding what looks like a System on Wafer (GTC March 2025)
There is another TSMC technology, CoWoP (Chip on Wafer on Platform), which removes the package substrate and connects the interposer directly onto the motherboard. Digitimes reports that Nvidia is considering CoWoP. Reducing a layer in the interconnect stack will improve signaling performance at lower power and improve power management.
CPO (co-packaged optics) for scale-out networks CPO cuts power substantially in scale-out networks. It uses 1/3 the power of pluggable optics and is cheaper and smaller. Roughly 5% to 10% of data center power today is used by pluggable optics. CPO for scale-out networks can reduce this by two-thirds. Broadcom and Nvidia have already introduced CPO switches in parallel with copper versions.
Larger pods using CPO (co-packaged optics) CPO improves throughput/watt versus copper by enabling pod-size to span multiple racks – whatever copper can do, CPO can enable 4 to 10X increases in pod size, which dramatically reduces the power for a given level of GenAI throughput.
CPO is not an alternative to SoW-X. Both can be used together. If a 2-XPU package can have 256+ optical links, an SoW-X with 16 XPUs could have 1,024 to 2,048 optical links, enabling potentially much larger, much more power-efficient pod sizes.
SoW-X isn’t limited to compute. Imagine a SoW-X switch with 2,048 optical links connecting to 2,048 SoW-X accelerators, each with 16 XPU dies. This creates a 32,000 GPU pod with single hop latency. This would provide much higher throughput/watt for the largest models.
Replace older AI racks All of the above lead to much greater power efficiency than was available in the past. If power is otherwise unavailable, Hyperscalers will be incentivized to replace older Nvidia GPUs with the latest to get a doubling or quadrupling of throughput per watt. This will be economically feasible if the demand is there.
Software can help too: small language models (SLMs) The leading-edge models have giant compute requirements and a huge database of knowhow. But if you have a focused application, you don’t need the ability to speak Chinese, quote every line of Shakespeare, or list the Kings of England since 1066. A small language model (SLM) is trained on a focused database optimized for a specific task. It will therefore have one or two orders of magnitude fewer parameters and corresponding lower compute cost and power, but can answer the questions needed about the focused database.
Nature Magazine recently reported on Tiny Recursive Models that outperformed the best LLMs on visual logic puzzles like sudoku and mazes using 1/1,000th the compute.
Emerald.AI’s Conductor Platform is software that works with the data center and the power grids. It appears to give the power grid the ability to, among other things, pause/slow down batch processes to temporarily reduce grid demand. This gives the power provider a lot more flexibility that Emerald AI says will enable the power provider to provide grid access in months, not years. They are working with Nvidia and Digital Realty on the first power-flexible AI factory in Virginia.
Power-efficient AI chips will maximize the US grid & save $
There are numerous options for significant improvements in GenAI power efficiency with ICs.
Even in places where power is plentiful (China, Saudi Arabia), electricity is still a significant cost factor in data center operations.
Power constraints in the USA will spur development of more power-efficient AI chips that should enable the demand for rapidly growing GenAI capacity to be met and save money on electricity. This is also important to reduce the political backlash that will come with rising residential electricity rates.