Everyone decided to be super productive in the last three weeks, which is unfortunate because I’ve been in a food coma thanks to back-to-back Thanksgiving meals. This is what happens when both my wife and my family have longstanding traditions of doing the full Thanksgiving spread and *also *live within 45 minutes drive of each other. Lunch at her place, dinner at my place, and I’m basically in hibernation for the next ten days.
But I had to get out of hibernation mode because, wow, everyone else has clearly *not *been relaxing and eating turkey. It’s become something of a yearly tradition for all of the big AI labs to just dump updates at the end of the year. Santa came early this year and he brought a *lot *of presents. So let’s dive in and see what exactly has been going on in Si…
Everyone decided to be super productive in the last three weeks, which is unfortunate because I’ve been in a food coma thanks to back-to-back Thanksgiving meals. This is what happens when both my wife and my family have longstanding traditions of doing the full Thanksgiving spread and *also *live within 45 minutes drive of each other. Lunch at her place, dinner at my place, and I’m basically in hibernation for the next ten days.
But I had to get out of hibernation mode because, wow, everyone else has clearly *not *been relaxing and eating turkey. It’s become something of a yearly tradition for all of the big AI labs to just dump updates at the end of the year. Santa came early this year and he brought a *lot *of presents. So let’s dive in and see what exactly has been going on in Silicon Valley.
Do you guys remember what it was like a few years ago? Back in like 2021 or 2022, when ChatGPT was still new and people thought this was the peak of image generation?
It was a fun time to be in the AI space. AI wasn’t obviously terrifying, people weren’t randomly up in arms about water usage, the titans of the tech world hadn’t gone politically insane, and every few days there would be some fantastic memes about how much Sam Altman was trolling Google. In videogame terms, OpenAI had tempo, a steady drumbeat of releases that minted millionaires and revolutionized how the world thought of linear algebra and GPUs. Meanwhile, Google was totally scattershot. They were pushing something called Bard (remember Bard?), which wasn’t particularly good, and even when they released cool things they would do so with terrible messaging.
Part of the story here is just that everyone was happy to support the obvious David in this David and Goliath story. But also, Google really was just caught totally unawares, and it took a long time for them to get their shit together. I know a bunch of people who jumped ship around this time, because morale was hovering somewhere between rock bottom and mutiny.
And like I said, some fantastic memes:
I sorta wonder if Sundar took it all a bit personally? Because in 2025, the situation is a bit different, and Google is, uh, to put it kindly, absolutely stomping OpenAI right now.
Just on the shallow high level layman’s view: the general consensus is that Gemini 3 is good, and GPT5 was disappointing. Gemini 3 is an incredible model. It can do a lot of stuff, really well. In fact, the only thing that it is kind of mid at is coding (in my opinion, Claude is a bit better). This is born out in benchmarks, where Gemini just *sweeps *everyone else.
*Google also just released a slate of vision-specific benchmarks for Gemini 3 Pro, which you can find here**. *Things are moving too fast, these Tech Things posts are getting stale even as I am writing them.
Doing well on any one of these benchmarks suggests a very peaky model, maybe one with cherry picked results, that is unlikely to be generally useful. Doing well on *all *of the benchmarks? That says something different.
There are a few benchmark scores here that I want to highlight.
The first is Humanity’s Last Exam (HLE). This has become the gold standard for really hard benchmarks. HLE is exactly what it sounds like: a benchmark composed of as many incredibly difficult problems as could be crowdsourced from the internet, with the intent of being the last word in exams that we could feasibly grade an AI on. Questions range across disciplines, from classics:
to physics
To chess puzzles:
There has been some criticism of HLE — supposedly some of the questions in the dataset have incorrect answers — but there is a pretty high correlation between HLE performance and general purpose “intelligence.” Gemini did exceedingly well on HLE, outscoring GPT5 by 11% (or a ~40% improvement over GPT5). HLE is supposed to be the best test humans as a species can put together. It was only released in Jan 2025, and Gemini is already nearly halfway through!
The second is ScreenSpot-Pro. This is a pretty no-name benchmark. It doesn’t get talked about much in the AI world, at least not the circles I am in. Engineers aren’t swapping ScreenSpot-Pro scores in the dive bars in Mountain View or Dogpatch, and for good reason: this benchmark is way too specific. The basic idea behind ScreenSpot-Pro is to see how well a model can identify real UI/UX elements to do certain tasks from high resolution screenshots of application screens.
This benchmark is basically correlated with exactly one thing: how well can your model understand computer screens. For the most part, this is not a particularly useful skill. Unless, of course, someone can figure out how to get an AI to use a mouse and keyboard and drive a computer like a human. At which point, understanding computer screens becomes *incredibly *important.
All of the major players in the AI space are working on ‘computer use’. Google, Anthropic, and OpenAI have all released experimental beta platforms where Gemini/Claude/GPT can drive a webbrowser. For the most part these betas have been silly and unusably slow. But as the kinks get ironed out, the ability to read screens will only increase in value. And Gemini is just in its own league right now, scoring double Anthropic and nearly 24 times GPT5.
**The last is Vending-Bench. **Vending-Bench is just an extremely goofy benchmark. You may notice that it is the only benchmark where the result is denominated in dollars. Vending-Bench asks a simple question: can an AI run a vending machine? The benchmark involves putting an agent in a test harness that will simulate things like emails with vendors or customer purchases over the course of a year. The AI has to do things like order chips and make sure soda is stocked and so on. The reason the benchmark is dollar denominated is because the evaluation metric is the vending machine’s *net worth *at the end of the task. Like I said, just an extremely goofy benchmark.
Goofiness aside, success in Vending-Bench generally suggests that the model is capable of performing some very useful long context tasks. Context window limitations suck. Most models start performing significantly worse after ~120k tokens and have a hard single shot cap of 1m tokens. A single run of Vending-Bench will require ~60-100 million tokens! So Gemini’s performance here is a valuable indicator of its general purpose ability.
Of course, benchmarks can be gamed and we shouldn’t over index on any single performance. My preferred test is the vibe check. What are the people saying? Among programmers, I haven’t seen a huge shift. People on cursor are staying on cursor, people on Claude are staying on Claude. But outside of programming, I think Gemini is very popular.
It even saturated Simon Willison’s SVG of a pelican on a bike test!
Gemini 3 Pro has a new concept of a “thinking level” which can be set to low or high (and defaults to high). I tried my classic Generate an SVG of a pelican riding a bicycle prompt at both levels.
Here’s low—Gemini decided to add a jaunty little hat (with a comment in the SVG that says
<!-- Hat (Optional Fun Detail) -->):This is genuinely an excellent pelican, and the bicycle frame is at least the correct shape:
Honestly though, my pelican benchmark is beginning to feel a little bit too basic. I decided to upgrade it. Here’s v2 of the benchmark, which I plan to use going forward:
Generate an SVG of a California brown pelican riding a bicycle. The bicycle must have spokes and a correctly shaped bicycle frame. The pelican must have its characteristic large pouch, and there should be a clear indication of feathers. The pelican must be clearly pedaling the bicycle. The image should show the full breeding plumage of the California brown pelican.For reference, here’s a photo I took of a California brown pelican recently (sadly without a bicycle):
Here’s Gemini 3 Pro’s attempt at high thinking level for that new prompt:
And for good measure, here’s that same prompt against GPT-5.1—which produced this dumpy little fellow:
All of this really underscores just how much of a dud GPT5 was. A few weeks ago when GPT-5 came out, I wrote:
The big story about GPT-5 is about what it isn’t.
It isn’t a world-changing super-intelligent insane-step-up on the intelligence ladder. It isn’t God. It isn’t close to God.
Now, if you’ve been reading my blog for any length of time, you’ll know that I didn’t really ever suspect OpenAI would be the one to stumble upon God in the machine, even though that is in some sense their explicit purpose. I tend to think Google is going to do it, mostly by accident, and will probably also end up sitting on the research for too long until OpenAI-2-electric-boogaloo comes around and tries to eat their lunch, again.
But still. There was so much hype around GPT-5, and now all that hype has deflated.
…
In retrospect, I think GPT-5 was always going to be disappointing. I’m sympathetic to the OpenAI team here, people were expecting literal miracles. But also, Sam definitely played a role in building up hype — and, as a result, increased the mountain OpenAI would have to eventually summit.
As a marketing tactic, I am certain that hyping GPT5 to the moon won some short term wins (i.e. capital investment). But in the long term, building all that hype and failing to deliver did some serious damage to OpenAI’s brand.
Gemini also had a lot of hype, but it was all organic. No one was tweeting about how Gemini was going to be like the Death Star. So Gemini had a lower bar to clear, and it gracefully sailed over. The overall sobriety of the Google team has the additional side effect of making them seem like the adults in the room. A few years ago, that was maybe a ding — “O, the adults? The slow geriatrics with walkers who can’t get anything done?” But now that the space has matured, and serious money has poured in from all sides, and the entire world economy seems to depend on the success of these models, a little bit of maturity seems like useful branding.
And Gemini 3 isn’t even the only model that Google recently released! Nano Banana, their image editing model, is also best in class. And Google recently released an IDE called Antigravity, built on the back of their Windsurf ‘acquisition’ — which, as a reminder, happened from underneath OpenAI.
So OpenAI is wayyyy on the back foot. The joy and the trolling is gone; friends inside the company describe tough working hours and a lot of sadness. I expect to see at least a few exits.
Reporting from Ars Technica:
The shoe is most certainly on the other foot. On Monday, OpenAI CEO Sam Altman reportedly declared a “code red” at the company to improve ChatGPT, delaying advertising plans and other products in the process, The Information reported based on a leaked internal memo. The move follows Google’s release of its Gemini 3 model last month, which has outperformed ChatGPT on some industry benchmark tests and sparked high-profile praise on social media.
In the memo, Altman wrote, “We are at a critical time for ChatGPT.” The company will push back work on advertising integration, AI agents for health and shopping, and a personal assistant feature called Pulse. Altman encouraged temporary team transfers and established daily calls for employees responsible for enhancing the chatbot.
The directive creates an odd symmetry with events from December 2022, when Google management declared its own “code red” internal emergency after ChatGPT launched and rapidly gained in popularity. At the time, Google CEO Sundar Pichai reassigned teams across the company to develop AI prototypes and products to compete with OpenAI’s chatbot. Now, three years later, the AI industry is in a very different place.
Ars subtitled the article “The Empire Strikes Back”. Maybe Sam wasn’t talking about GPT-5 when he posted this picture?
So the vibes on the street are Gemini >>> GPT5. What is actually driving that outcome?
From a technical perspective, all models require data and compute. The view from ten-thousand feet is pretty straightforward: the more gigabytes of data and the more flops of compute that you have, the better your models are.
The view from 9999 feet is immediately way more complicated. It turns out that just having piles of chips around does not do anything for you. You need to wire those chips up. And then you need to get the energy to run those chips. And you need systems to cool those chips. And you need energy to run *those *systems. And you need people to manage the systems that provide energy to run the coolers that make the chips work.
Here’s a real thing that happened. A friend of mine works at Anthropic on model pretraining. They were running a big training run for one of the newer Claude models, possibly Opus. These training runs can cost millions of dollars, so it is critical that everything works as expected. On this particular Tuesday, things were *not *working as expected. The GPUs had somehow fallen out of sync with each other, resulting in corrupted training steps that were screwing with the model. I was grabbing drinks with her the day after her team figured out what was going on. Apparently, there was a drought in Houston. That drought caused the water level at a certain dam to go below a certain threshold. The dam was producing energy, and had to switch to some backup generator. The switch sent a small surge through the energy grid. Which, of course, hit the datacenter that was holding the GPUs. Some of those GPUs ended up with slightly misaligned clocks, and boom. Hundreds of thousands of dollars in compute1 and engineering time gone. I would also need a drink if I had to deal with that kind of bullshit all day.
All of this to say: training models is really hard. Even though OpenAI nominally has access to a lot of chips, they still have to overcome the engineering hurdles to get those chips working. Supposedly, OpenAI has not been able to successfully train a model end to end since ~June 2024. The training runs keep failing due to random bullshit. As a result, even GPT5 is just a reskin of their previous generation models. Now, OpenAI has done some really impressive work getting real improvements even without full training runs. But a year and a half is *forever *in the AI world. OpenAI’s relative engineering debt is catching up to them.
Google on the other hand is pretty bad at making products. But they are fantastic at solving ludicrously difficult engineering challenges. Obviously, they have put in a ton of elbow grease on vertically integrating their stack from the physical datacenters to the jax libraries that run their models. But the real crown jewel is the TPU, the tensor processing unit.
For those who somehow are unaware, a TPU is a custom chip that is optimized for AI use cases. Google started investing in these back in 2013, way before the modern AI wars. They have been used internally to power most of Google’s training and inference, including for services like Search and Translate. TPUs are much more constrained than GPUs. GPUs can do lots of things. They can do all kinds of computation, and also sometimes they may even render a screen or display some graphics or something. By contrast, TPUs can only really do one thing — 8x128 matmuls.2 But they do that one thing very very efficiently.
Bluntly, TPUs just work. At the chip level, they are more stable and can be run at higher clock speeds for longer periods of time. At the server rack level, each TPU is wired so that they can efficiently communicate with all of the other TPUs in the rack with minimal latency. And at the datacenter level, you can wire together over 9000 TPUs acting in unison as a single pod. By comparison, with GPUs, you max out at less than 100. All of this together means that training bigger and more compute intensive models is easier *and *faster *and *likely cheaper on TPU stacks than on GPU stacks. Maybe even exponentially so.
I’ve been harping on the value of the TPU stack for a while now; the rest of the world seems to have finally taken notice. Anthropic was the first. Anthropic’s compute acquisition model has them sitting on top of the chip providers of the world. They have investment from both Amazon and Google in the form of cloud credits. My understanding is that the original versions of Claude were all trained on AWS GPUs. But for Claude Opus 4.5, they switched the GCP TPUs. Opus is currently Anthropic’s biggest model, it’s state of the art in programming, and it’s the model that I use basically all day every day.
From a marketing perspective, this is a massive win for Google. The biggest issue with all Google developer tools is that they often do not really work outside of Google. Protobuf and gRPC and Blaze are the lifeblood of the company, but their opensource equivalents are basically impossible to get working. This is partially a result of Google’s organizational ethos. For the most part, the external tools Google releases are *copies *of the internal equivalents, which means they do not get used by Google’s engineers on a regular basis to serve Google workloads. The canonical example is the distinction between Borg and GCP. Borg is essentially a massive internal cloud provider that abstracts Google’s datacenters into a giant world computer. GCP is…the same thing, but worse. Google does not host Search or Gmail on GCP, so there is much less incentive to make GCP a usable product. Especially when compared to Amazon, which runs all of its public services (Amazon shopping, Alexa, Twitch, etc) on AWS. Google’s lack of external developer support ends up being a self fulfilling prophecy. No one really uses Google’s dev tooling, so Google’s dev tooling does not really get better, which results in fewer people using Google’s dev tooling.3
Anthropic has more than a few ex-googlers on staff, so I suspect they had an easier time working around TPUs idiosyncrasies. But still. Anthropic’s successful training runs are a signal to other companies that the TPU stack is unlike Google’s other offerings. It’s safe. And as a result, other big model companies are starting to line up for access.
NVIDIA has had a stranglehold on the high performance compute market for a long time. That moat comes in two forms: really good chips, and a very well developed CUDA ecosystem. Together, these have essentially stopped any competitors from really getting a foothold. Why would you use a worse chip that is *also *less well understood and battle tested? Even the big Chinese firms are using NVIDIA chips, and they are supposedly behind strict export controls! It’s no surprise that NVIDIA is currently the world’s most valuable company. It can claim massive profit margins and drive up prices in bidding wars between the other big players. As the LLM providers all cannibalize themselves, all the actual value accrues at the chip layer.
Of course, NVIDIA’s near monopolistic dominance is bad for the folks actually training models! Competition at the model layer helps NVIDIA; competition at the chip layer helps everyone else. This is why, when Meta announced that they were looking into buying TPUs, Google *and *Meta stock shot up, while NVIDIA was down on the same news. For Google, this is muscling into NVIDIA’s turf. For Meta, this is a way out of dealing with NVDIA’s pricing.
The folks over at semi-analysis had an excellent review of the even-more-technical details of the chip wars here. I recommend reading the whole thing, especially if you like getting in the weeds of chip design. Quoting liberally:
These past few months have been win after win for the Google Deepmind, GCP, and TPU complex. The huge upwards revisions to TPU production volumes, Anthropic’s >1GW TPU buildout, SOTA models Gemini 3 and Opus 4.5 trained on TPU, and now an expanding list of clients being targeted (Meta, SSI, xAI, OAI) lining up for TPUs. This has driven a huge re-rating of the Google and TPU supply chain at the expense of the Nvidia GPU-focused supply chain.
…
Another reason Nvidia has been on the defensive is a growing chorus of skeptics who argue the company is propping up a “circular economy” by funding cash-burning AI startups, essentially moving money from one pocket to another with extra steps. We think this view is misplaced, but it has clearly struck a nerve inside Nvidia.
> We think a more realistic explanation is that Nvidia aims to protect its dominant position at the foundation labs by offering equity investment rather than cutting prices, which would lower Gross margins and cause widespread investor panic. Below, we outline the OpenAI and Anthropic arrangements to show how frontier labs can lower GPU TCO by buying, or threatening to buy TPUs.
...
OpenAI hasn’t even deployed TPUs yet and they’ve already saved ~30% on their entire lab wide NVIDIA fleet. This demonstrates how the perf per TCO advantage of TPUs is so strong that you already get the gains from adopting TPUs even before turning one on.
The TPU stack has long rivaled Nvidia’s AI hardware, yet it has mostly supported Google’s internal workloads. In typical Google fashion, it never fully commercialized the TPU even after making it available to GCP customers in 2018. That is starting to change. Over the past few months, Google has mobilized efforts across the whole stack to bring TPUs to external customers through GCP or by selling complete TPU systems as a merchant vendor. The search giant is leveraging its strong in-house silicon design capabilities to become a truly differentiated cloud provider. Furthermore, it aligns with marquis customer Anthropic’s continued push to diversify away from its dependence on NVDA.
The Anthropic deal marks a major milestone in this push. We understand that GCP CEO Thomas Kurian played a central role in the negotiations. Google committed early by investing aggressively in Anthropic’s funding rounds, even agreeing to no voting rights and a 15% cap on their ownership to expand the use of TPUs beyond internal Google. This strategy was eased by the presence of former DeepMind TPU talent within the foundation lab, resulting in Anthropic training Sonnet and Opus 4.5 on multiple types of hardware including TPUs. Google has already built a substantial facility for Anthropic, as shown below as part of our building-by-building tracker of AI labs.
In some ways, I wonder if semi-analysis *understates *how significant all of this is.
For example, one thing that they did not mention was how the TPU stack may drive more adoption of GCP generally. Most people use AWS or Azure because those platforms are more developer friendly. It’s just harder to use GCP, and as a result Google spends a ton of money on free credit giveaways to mostly get small startups on their platform. TPUs change the calculus. If you need TPUs, you’re not going to run half your workloads on AWS and half on GCP. You’re just going to switch everything over to GCP and call it a day. That’s because even though the other clouds are more developer friendly, they aren’t *so *developer friendly that they override the significant cost of maintaining multiple cloud deployments. For TPUs, there is no other game in town. If you want em, you gotta make a GCP account.
Previously, I wrote:
While it is immediately obvious that Google has invested in longer context windows, it is not immediately obvious why Anthropic and OpenAI have not. But to me, the answer lies in the underlying chips being used. Google’s vertically integrated TPUs are extremely efficient at representing and working with large tensors, which in turn allows for matrices and matrix operations that are absolutely massive. As context window size increases, the memory complexity of the transformer increases quadratically. The reality is that even really good GPUs simply cannot compete with the TPU architecture. So Google gets twice the leverage on their TPU stack — not only does it give them economic independence from the NVIDIA chip bottleneck, it also allows them to generate unique advantages that other companies simply do not have.
Tech Things: Gemini 2.5 and The Bull Case for Google
·
Mar 29
Now, look, I have to admit up front that I’m biased here. I worked at Google for 4 years, and I loved it, and also I still have a little tiny bit of their stock (read: all of it), so, you know, feel free to critique my incentives if you want.
Now Google is actually getting 3 or 4x the leverage, using the TPU stack to directly attack OpenAI, NVIDIA, *and *AWS all at once. Google’s technical moat here is significant — it would take at least a few years before anyone else would be able to match TPUs on specs, and playing catch-up is pretty damn hard. AWS is probably second with their Trainium/Inferentia chips, but it is a distant second.
It’s a good question. It seems unlikely that OpenAI will suddenly start using TPUs. It has over a trillion dollars in commitments for GPUs, and the company has built something of a brand around making fun of Google. Switching to TPUs may help with training bigger models, but it would optically be tantamount to admitted defeat, at least in the public eye. If OpenAI can’t compete on the engineering, it has to compete on product. Up until recently, that’s more or less what it’s been doing.
OpenAI has been experimenting with workflow tools, making chatgpt a platform, whatever hardware thing is going on with Jony Ive. They’ve made a bunch of acquisitions of various product companies. And they changed their deal with Microsoft so that they do not have to hand over their product innovations to the tech giant.4
And of course, ads.
One advantage that OpenAI still retains over Google is the dominance of ChatGPT among every day casual consumers. ChatGPT is a household name, my grandma knows what ChatGPT is. By contrast, very few people know what Gemini is. They may not even realize that the little AI answer box on the top of most search queries is powered by an LLM!
This cohort of users is extremely large but also extremely hard to monetize. They don’t pay for digital services. They certainly don’t pay for ChatGPT. Inference costs for LLMs have gotten cheaper but they are still orders of magnitude more expensive than a search query. Which means all of these free users have been something of an albatross around OpenAI’s financials, even though those same users are what make OpenAI relevant in the first place.
Well, in the entire history of the internet, there has only ever been one monetization strategy that has ever worked with free users: ads. You pay with your attention, and OpenAI converts your eyeballs into dollars.
For what it’s worth, LLMs are an obvious place to advertise. It’s a bit dystopian to imagine getting advertisements in the middle of, like, a really emotional conversation. But it’s dystopian in large part because I think it would obviously work. Advertising is all about timing. You want to get your ad in front of the user right at the critical moment when they are looking for your product. This is why search has always been so lucrative — the best time for Nike to advertise their shoes is exactly when the customer is typing ‘new shoes’ into the search bar. Advertisements coming from your favorite LLM is an even more aggressive form of targeting, because people inevitably form emotional high-trust relationships with these chat bots. It’s pretty easy to ignore ads on Google’s cold spartan interface. But are you really going to ignore ads when they are coming from your digital wife?
So if you ask me, LLM advertising is simply inevitable. Startups like Profound have experienced rocketship growth just from providing analytics about how certain brands show up in conversations.5 It turns out Nike will pay boatloads of money to know exactly when and how references to their shoes appear in chat. Of course, of course, OpenAI will take that money to inject the ads in stream. It won’t even be particularly difficult. You could just have the LLM get an additional bit of context dynamically added to its system prompt telling it which brands won the auction to be mentioned in that particular chat.
The one thing to note is that there is obvious brand risk to OpenAI for being too aggressive in its inline-advertising. I expect them to keep the advertising outside of the main chat for a while. But much as Google sponsored links slowly became indistinguishable from the real content, so will OpenAI move their advertisement to be inlined in the conversation.
It’s possible that OpenAI puts all of this product development on pause while in their ‘Code Red’ state. Anecdotally it seems like Sam is trying to get all hands on deck to get a better GPT model out there ASAP. But these aren’t things that can easily be done in sprints. It took Google ~3 years to turn the ship around. OpenAI may not have 3 years of runway, *and *it is playing catchup against a competitor that has some very deeply entrenched structural advantages. More to come, I’m sure.
I normally would want to expand these out into their own post, but this newsletter has already gotten long enough. Instead I’ll just link out to a few other sources and provide a line or two of commentary.
**Opensource Models: **Qwen and Mistral both released new opensource models. They’re good! Opensource / Chinese models continue to lag a generation or so behind the top tier big three models. I still tend to think the LLM infra layer is going to be winner-take-all, but maybe there will always be a valuable niche for truly open models. Right now, it seems that niche is erotica.
**Anthropic IPO: **Anthropic seems to be gunning for an IPO, based on leaked information about a lawfirm that they put under contract. I find this at least somewhat surprising. Anthropic is likely following OpenAI’s lead, but…why? If you really think you’re about to get god in the machine, do you really want to make it answer to some Delaware Court?
**OpenAI DRAM: **This one may deserve a post on its own. Back in October, OpenAI announced that they were going to be working with Korean giants Samsung and SK to provide DRAM chips. Neither knew the other was part of the deal, much less the amount the other was providing. OpenAI secretly contracted these two to provide a staggering ~40% of the world’s DRAM supply. DRAM is normally a cheap commodity, but it shows up in everything with a computer. OpenAI’s massive order resulted in a bank run on memory chips as the rest of the industry scrambled to ensure supply. There was no cushion or backstop because tariffs had lowered previous demand, so prices are up 100%+ and suppliers are quoting 10 month+ delays. It’s unclear why OpenAI wanted this many chips. Most of them are likely going to remain in storage. One possibility is that they wanted to make everyone else sweat. Another possibility is that this is caused by NVIDIA announcing that partners need to bring their own RAM.6 Honestly, my main takeaway is that it’s really been a rough time to be a gamer. First GPUs spiked because of crypto, then again because of AI, and now RAM is going to blow up the cost of both desktops and consoles. Can’t wait for the bubble to pop, I need to play Elden Ring on 4k.
Luckily the full run wasn’t a bust; they had checkpoints they could roll back to.
Yes I know they can also do 128x128 matmuls, I’m simplifying for convenience.
This is further exacerbated by Google’s tendency to kill products that have fewer than a billion users. No one wants to build a tech stack on something that may disappear.
In exchange for 20% of the revenue, see here.
It’s clever — they literally do this by automatically running thousands of queries against LLMs just to see which brands appear and which don’t. Not *exactly *rocket science, but a very well timed and well executed product.
Although it is unclear whether OpenAI knew NVIDIA was going to make that announcement, or if NVIDIA made that announcement because OpenAI bought up all the RAM on the market
No posts