A rambling post. I’m more uncertain of my conclusions in this one. Note: I made some edits to the wording after this was published.
Eventually, we’re going to stop making better computers. In other words, the real cost per unit of useful computing will flatten out, becoming relatively constant over time. Innovation in semiconductor chips will slow and the low-hanging fruit in other parts of the computing supply chain will get eaten1.
This will have implications for everything.
This view comes as a consequence of my view that breakthroughs are rare and decreasing. We try stuff and converge on good soluti…
A rambling post. I’m more uncertain of my conclusions in this one. Note: I made some edits to the wording after this was published.
Eventually, we’re going to stop making better computers. In other words, the real cost per unit of useful computing will flatten out, becoming relatively constant over time. Innovation in semiconductor chips will slow and the low-hanging fruit in other parts of the computing supply chain will get eaten1.
This will have implications for everything.
This view comes as a consequence of my view that breakthroughs are rare and decreasing. We try stuff and converge on good solutions. Eventually we find the economically efficient way to make a product and move on. This is true for all goods, from cement to computer chips.
This is very clear in the semiconductor industry for two reasons.
First because semiconductors make you painfully aware of the limitations of matter and physics. Transistors are dozens of atoms wide, there isn’t much room at the bottom. Smaller size isn’t necessarily desirable as smaller transistors leak more current, raising energy consumption and errors.
Long ago, technology node names like 90 nm actually corresponded to the size of transistor. Progress stalled but the industry kept up appearances by changing the name to match the old trend. This year, TSMC is producing “2 nm” chips, but the actual size of transistors is closer to 45 nanometers. The shrinking transistor has slowed and Dennard scaling “ended” around 2006.
**Second **because the semiconductor industry has tried a lot of stuff. It has saturated many S-curves already. Dennard scaling was the first casualty, but there have been many more. Thousands of companies have sprung up and died optimizing one part of the supply chain only to be obsoleted by something else.
Watch one of my favorite videos and be awed by the amount of innovation that went in to creating just one flashlight in the semiconductor manufacturing process:
Watch some of the other videos in that playlist and you’ll see how much work and deliberation went into developing EUV. The industry explicitly considered other techniques, built out prototypes, started companies, and then went with EUV. That new company touting X-rays as the new way to make chips? Yeah the industry has already considered and rejected X-rays twice.
This is a microcosm of what’s happening at every step of chip production. Processes are evolving, with massive research effort to try different techniques2. The process we have today is a product of over 60 years of tinkering. With growing research effort, chips can steadily improve until we’ve wrung out every opportunity.
Sorry, but current AI won’t revolutionize chip design. Eventually there will be enough training data for AI to design chips, but we quickly run into the problem that logic optimization and place and route are NP-hard. Chip design is a fundamentally hard problem. AI tools will help, but have limits3.
Let me show you why space data centers are silly in the length of a skeet.
Option A: build solar, radiators, and chips on earth.
Option B: build solar, radiators, and chips on earth AND pay $3000/kg+depreciation to put them in space.
You’re paying extra money for no apparent benefit.
The same principle applies if you’re launching self-replicating asteroid-mining nanobots. You could have used them on Earth and saved yourself the launch costs and harsh environment and time delay.
If we build enough data centers on Earth that all the low cost land and resources get used up then maybe space data centers make sense. The latency is much higher and it’s unclear if there will be demand for compute at that scale, at least on Earth. I’ll address that possibility in a later section.
For now, going into space doesn’t dramatically lower compute costs.
There are many unconventional computing paradigms, perhaps one will swoop in once we’ve exhausted CMOS?
I’m skeptical, in part because none of them have overtaken semiconductors despite many, many people trying. It’s a trillion dollar bill that nobody has picked up.
Of the unconventional computing paradigms, we can safely reject those that use components that are much larger than today’s transistors. The speed of light limitations are daunting. Fluidics, MEMS, and biopolymers are simply too large to compete with the speed and cost of CMOS.
Analog computing paradigms (think “neuromorphic computing” or “thermodynamic computing”) are fraught as well. There have been many failed attempts in this field, and Zach lays out the challenges of Making Unconventional Computing Practical4.
The problem with analog is that when your signal is continuous, it’s easy to mess it up. Temperature changes, manufacturing variability, and interference all change the signal a little bit. In a chaotic system, those errors propagate out and mess up your computation. That’ why most analog computing companies have failed or switched to digital logic. Digital just works.
That leaves us with a few paradigms. I honestly don’t know how these will pan out, but I’ll note reasons for skepticism.
Spintronics applies materials that control electron spin inside of more traditional chips. The hope is that electron spin might provide an additional way to store or transfer information. I do think this has promise for improving memory bandwidth to some applications. But given how sensitive electron spin is to stray electromagnetic fields, I’m skeptical that the benefits are worth the costs of incorporating new materials and design complexity.
**Optical computing **startups in the last few years have pivoted from making computers to making optical interconnects for traditional data centers. Not a promising sign. Hopefully, this beachhead will enable optical computing to scale. However, optical inherits some of the challenges of analog computing, namely temperature dependence and manufacturing variability. These things can be addressed, but add cost.
The bigger issue is that optical components have to be physically larger than transistors, on the order of the wavelength of light they are carrying. This makes them 10-100x larger than transistors. Patterning at a larger scale means higher costs56. The saving grace is that light can send many different signals at the same time (multiplex). So even if the components are 10x more expensive, they might carry 100x more information. This makes it the most promising contender.
Superconducting computing inherits all of the problems of analog computers, while having components that are physically larger than normal computers, while demanding very low temperatures. People have tried and failed. I’ll pass.
Quantum computers (which often leverage superconducting components) offer small speedups in a few problems of interest. Quantum computers have to be kept very cold and shielded. You need lots of qubits for error correction. I doubt these will become cost effective for general purpose computing.
Reversible computing7 is interesting if compute scales to space, where energy is expensive and heat is a problem. These benefits are far weaker on Earth. The error correction is harder, the designs more complicated, and perfect reversibility is impossible, reducing the energy consumption advantage.
A key point from my breakthroughs post was once you “solve” a particular product, innovation jumps somewhere else. First you figure out how to plant, then breed plants, then make fertilizer, then automate harvesting, and so on.
As semiconductor innovation slows, innovation in other areas will grow. Solar and batteries will provide cheap data center electricity, new cooling systems will lower costs, optical or RF interconnects will gain traction, satellite internet will lower latency, and of course the AI/software layer will improve. There’s hundreds of opportunities for innovation here.
That said, this doesn’t refute the overarching conclusion. Eventually innovation in these areas will grind to a halt, we will have found the right way to build a data center.
Pushing to higher wavelengths of light has its limit, and high-NA EUV could be the end of photolithography. In the future of lithography I speculated that electron beams might be the final frontier. To this day, I still think electron beam lithography might be the economic limit of chip manufacturing. Atomically precise manufacturing might make better chips, but they won’t be cheaper, photolithography will make cheaper chips, but they won’t be better.
We’re a long way off from electron beams beating EUV though. Currently, e-beams are too slow. I’ll add some links in the appendix on making e-beams cheaper and simultaneously writing with thousands of beams with one device which seems promising.
The economics of LLM inference tell us that memory bandwidth is the key cost driver. Chip designers have come up with their own version of “just stack more layers LOL” called high-bandwidth memory. It involves stacking towers of fast memory close to the part of the chip that does calculations.
Designers are pushing this S-curve as hard as they can. Unfortunately, each layer gets more expensive8 while being slower since it’s farther from the rest of the chip. At some point, you’d expect there to be a cost optimal HBM size, just like we see for skyscrapers9.
Other avenues for hardware improvement are drying up too, as Tim Dettmers explains:
In the past it was possible to shrink the size of transistors to improve speed of a processor. This is coming to an end now. For example, while shrinking SRAM increased its speed (smaller distance, faster memory access), this is no longer the case. Current improvements in SRAM do not improve its performance anymore and might even be negative. While logic such as Tensor Cores get smaller, this does not necessarily make GPU faster since the main problem for matrix multiplication is to get memory to the tensor cores which is dictated by SRAM and GPU RAM speed and size. GPU RAM still increases in speed if we stack memory modules into high-bandwidth modules (HBM3+), but these are too expensive to manufacture for consumer applications. The main way to improve raw speed of GPUs is to use more power and more cooling as we have seen in the RTX 30s and 40s series. But this cannot go on for much longer.
He goes on to mention specialized logic and low-precision arithmetic as paths forward. But these have limits too.
If we had no more ideas, AI hardware would stop there. But the holy grail of inference compute is compute in memory**. **Since inference is a bunch of simple operations, make the arithmetic cores small and repeatable like memory. Put memory and arithmetic right next to each other and glue together massive amounts of compute.
This should increase the memory bandwidth a lot while increasing costs somewhat less. On balance, it’s not clear where the cost per unit of memory bandwidth will end up. After that, we’re kind of out of ideas at the design level10. But there are decades of tweaks we can pursue.
In the semiconductor industry, tech that’s ~10 years old is roughly accessible to any country. The trade secrets have mostly diffused, the equipment can be procured, and the talent from that era can be hired. Consider how China leads the world in producing trailing edge chips while still failing to produce leading edge chips.
In a world where chips stop getting better, everyone catches up to the state-of-the-art in about 10 years. That means competition, innovation, and falling prices.
Even better, since chips stop improving, depreciation costs fall dramatically. You can keep using the same hardware for decades.
There’s already plenty of competition over model provision, especially from open source providers. When the monopolies in chip production fall, we get even more competition in this space.
What does the intelligence market look like in this world? I expect RL-as-a-Service (specialized models) will dominate. The base models these are built on will converge to a certain price per unit intelligence, itself dependent on the price per FLOP beneath it. I imagine a range of different base model sizes, with the largest sizes serving an inference oligopoly and smaller sizes seeing lots of competition and low margins. Specialized model providers can turn a profit by building domain knowledge on top of small models.
For more, see Economic futures for LLM inference.
There’s two niches for AI hardware in the future.
One we’re already familiar with, chips and servers optimized to be in huge data centers. Here, high memory bandwidth at low cost is the key. This hardware can provide low cost, low speed AI for very cheap by using large batches. Particularly valuable for supplying and inference oligopoly or autonomous AI workers.
On the other end of the spectrum is local AI hardware. Specialized chips in your laptop. These run small models (smaller memory footprint) and don’t have high utilization, so FLOP/s/$ is the key metric here. These models can be extremely fast and responsive, perfect for collaborating with your AI assistant.
I’m unsure whether there will ever be much demand for space computers. The previous section showed why space computers are more expensive, at least until compute demand outpaces Earth’s ability to provide it.
It’s not clear that terrestrial demand will get that high. AI models are keeping the same memory footprint while increasing capability. We might see a Kuznets curve of AI compute demand once Claude writes all our apps for us. I would guess per capita compute consumption isn’t rising much outside of AI. Future netizens might still be reading blogs and watching YouTube.
And remember, the latency of servers outside of LEO (where most of the computer-chip-material is) is far higher than terrestrial servers.
Why would we want such a large amount of slow and expensive compute? I can think of one reason: The Age of Em11.
Traditional semiconductor manufacturing will win for the foreseeable future. In perhaps 30 years, progress will come to an end. The state of the art techniques will diffuse everywhere. The end of semiconductor monopolies and hardware depreciation mean compute will get very cheap. Perhaps another wave of innovation will bring an unconventional computing paradigm like optical to the fore. But I’m not holding my breath.
Space computing is the final frontier, the constraints are different and the scale far beyond what Earth could support. There we might find solutions new and bizarre.
Please enjoy these slapdash charts of hardware price-performance vs time. I focused on the two main performance metrics for AI chips.
I modified the Epoch data from here to make this spreadsheet. Note the values are in nominal terms, in real terms there’s been more improvement over time. I’m personally surprised at the lack of growth and it’s possible I’ve made an error.
Note that future AI performance depends more on BF16 and smaller number formats. Likely we’ve seen much more dramatic improvement in these formats. The Epoch data on this was sparse so I stuck with FP32.
From me:
Economic futures for LLM inference
RL-as-a-Service will outcompete AGI companies (and that’s good)
An intro to the Tensor Economics blog
From others:
Just read all of Zach’s tech blog, he’s more qualified.
The Rising Tide of Semiconductor Cost
The Unsustainability of Moore’s Law
How to Build a $20 Billion Semiconductor Fab
Semiconductor Fabs I: The Equipment
TSMC’s new 2nm chip will reportedly cost 50% more — get ready for more expensive laptops and phones
AI hardware and memory technology
The Special Memory Powering the AI Revolution
The Memory Wall: Past, Present, and Future of DRAM
[2601.05047] Challenges and Research Directions for Large Language Model Inference Hardware
The Breakthrough Solution to DRAM’s Biggest Flaw probably the most exciting research on memory technologies I’ve seen. Get rid of the capacitor and just use transistors for memory. Works particularly well with compute in memory. Though there are risks from using new materials and taking on more routing complexity.
Zach’s tech blog linked this presentation on 3D-split SRAM by Rahul Mathur. I don’t fully understand but I’m including it here because FOMO.
Cool stuff you can do with electron beams
Multiple-electron-beam direct-write comes of age
Hundereds of thousands of beams possible:
TSMC’s Incredible 2nm Curvy Masks - YouTube
Resolution Limits of Electron-Beam Lithography toward the Atomic Scale
Transverse Electron Beam Shaping with Light
Nanostructuring of electron beams
Precise atom manipulation through deep reinforcement learning
Atomic Semi • The Make Anything Company
Granted, I think we still have 25+ years of semiconductor innovation left.
Incidentally, the semiconductor industry seems to have embraced the try stuff model of innovation. They eschew theory and simply try many different things to try to solve a particular problem. Progress is evolutionary, companies stick to small changes to the things they know how to do.
EDIT: I’m aware of the fact that Google touts an AI system for designing its TPU’s. I can’t actually find much information on how much performance improvement is attributable to this AI system… hmm. There’s mention of this AI system reducing wire lengths by 6.2% on their chip. It probably reduces the time and cost of designing a chip too. These benefits are nothing to scoff at, but in a world where semiconductor progress slows, these are one-time gains. You’re only designing a chip once and using the same design indefinitely.
To be clear, patterning at a large scale while demanding very precise features is the problem. In traditional semiconductor manufacturing, larger scale features are actually cheaper. It’s the combination of the sensitivity of optical components applied to a large die area that makes it more expensive.
It also means more time delay for your signals, but light moves a little faster than electrical signals and can multiplex so it’s not clear if this is an issue.
Which overlaps with superconducting computers and QC’s.
Errors accumulate, lowering yields. Also it just takes more time per chip, increasing capital costs on a per-layer basis. And the wires are longer.
This optimal size could have a big impact on large model inference. More HBM means you can have larger batch of users and lower costs, as long as users are okay with slow inference.
One thing I’m unsure about is building an ASIC for a specific model. It should be far faster but also far more expensive.
And digital minds more generally, the Dyson sphere might be mostly AI’s.
No posts