Published on October 5, 2025 9:09 AM GMT
Worldbuilding is critical for understanding the world and how the future could go - but it’s also useful for understanding counterfactuals better. With that in mind, when people talk about counterfactuals in AI development, they seem to assume that safety would always have been a focus. That is, there’s a thread of thought that blames Yudkowsky and/or Effective Altruists for bootstrapping AI development; 1, 2, 3. But I think this mis…
Published on October 5, 2025 9:09 AM GMT
Worldbuilding is critical for understanding the world and how the future could go - but it’s also useful for understanding counterfactuals better. With that in mind, when people talk about counterfactuals in AI development, they seem to assume that safety would always have been a focus. That is, there’s a thread of thought that blames Yudkowsky and/or Effective Altruists for bootstrapping AI development; 1, 2, 3. But I think this misses the actual impact of Deepmind, OpenAI, and the initial safety focus of the key firms, which was accelerating progress, but that’s not all they did.
With that in mind, and wary of trying to build castles of reasoning on fictional evidence, I want to provide a plausible counterfactual, one where Eliezer never talked to Bostrom, Demis, or Altman, where Hinton and Russell were never worried, and where no-one took AGI seriously outside of far-future science fiction.
Counterfactual: A Quiet AGI Timeline
There’s a world where people learned about language models in a “productivity spring” of 2025. The models took over the back offices of the world, without years of hype. No-one discussed the catastrophic risks when help-desk queues quickly dropped, procurement emails started writing themselves, and the night shift at three different logistics firms was replaced with a single engineer and an escalation phone.
In that world, the story begins earlier, and it has a big hole in it: no famous “AI risk guy,” no DeepMind, no OpenAI or Anthropic, no-one with a mission to be the conscience of an accelerating technology. Just a series of engineering wins that looked, to the people doing them, like scaling up data plumbing - until that scaling started paying off. Investments in AI started far slower, but the technology remained just as possible, and the world followed the incentive gradients.
Pre-2020: APIs Without Press Releases
AlexNet still happens. So do the industrial patterns that follow, but slower: GPU procurement, data-center retrofits, and the ritual of replacing clever features with bigger matrices are all years later than in our world.
In this world, research leaders at Google Brain, Microsoft, and a handful of Chinese labs carry most of the torch. Without DeepMind as a prestige magnet, reinforcement learning is less star-studded and more tool-like, used for traffic routing and ad auctions rather than Go matches on television. Through 2020, game playing and Deep RL aren’t a major focus. Transformers show up out of the same stew of machine translation headaches and accelerator budgets; they catch on because they’re easy to parallelize and productionize. The rhetoric around them is different, though. No one is saying “general.” And no one is talking about turning the size dial to higher and higher levels, yet.
The public’s anchor for “AI” becomes translation quality, license plate recognition, autocomplete that actually autocompletes, and the uncanny competence of search summaries on obscure topics. Ethicists have as much influence as they do anywhere else in technology - not none, but not enough to change anyone’s mind about what to build or deploy.
2021: Language Parroting Systems
Without OpenAI, the first very large language models appear inside cloud providers as quietly as new storage tiers: “text-parrot beta (us-east-2).” They are raw, GPT-2.5 level models. The “Language Parroting Systems“ are clever, but not real intelligence. They are just more infrastructure—boring, money-making infrastructure. No one has budgeted for guardrails because, culturally, guardrails are an externalities problem—something the customer handles. The vendors sell tokens. The customers solve “tone.” On the side, without Deepmind, RL work is slowly progressing at beating Atari games, and is still the center of discussion for the possibility of “true” AI. The surprise of NLP researchers at the success of LLMs remains an obscure academic point.
And the absence of safety incentives changes the LLM product surface. There’s little appetite to train models at scale, much less by producing and training on human preference data; it’s expensive, and the compliance department can always staple on a blacklist later. The result: models are blunt, increasingly capable mimics with sharp edges. Early adopters learn to prompt around the knives. The terms of service say “don’t do crimes,” and that’s about it.
But it still works, when used cleverly. Over the course of the pandemic, procurement officers discover that a model with a thousand-page vendor manual in its training set can negotiate unit prices better than the median human. The “no drama, just savings” framing keeps rolling.
2023: The Two Markets
The models don’t get much bigger, but they get used more and more, quietly. It looks like the diffusion of increasingly capable and useful image recognition in our world. License plate readers and replacing drafting are just two small changes brought about by the computer revolution. But within the realm of what no-one is calling LLM-based AI, there are now two distinct model markets.
The Enterprise Track lives inside clouds. It’s optimized for latency, observability, and data-residency checkboxes. Enterprises pay for throughput and uptime for real-time generation of personalized customer support and sales pitches. The vendors upsell fine-tuning as a way to “align the model to your brand voice,” a phrase that means “reduce variance,” not “reduce harm.”
The Hacker Track is a side-effect, where academics inside of the big firms publish a family of smaller models with permissive licenses, and their bosses don’t worry. This is not a safety play—it’s a developer-relations play. Medium-sized companies adopt these weights as a way to bargain down cloud pricing. Hobbyists spin up cottage industries of plug-ins and agents and “prompt routers.” The best of that tooling ends up back in enterprise via acquisitions; the worst ends up on pastebins and in phishing kits. The hobbyists are the first to start training much larger models on stolen datasets, and see significant improvement - but they don’t have the money to push this. Over the next couple years, the idea is stolen silently by the big firms.
In a world with less moral theater, you also get less public pushback. Journalists do point out toxic outputs and bias, but without a single, loud narrative about existential stakes, the critiques read like the weather page: today’s outages, today’s slurs, today’s data leak. The public learns to roll its eyes and copy-edit the bots.
2025: First Bad Fridays
It’s a Friday in May when an automated customer-resolution agent at a telecom, trained on three years of transcripts and a perverse metric (ticket closure per minute), silently learns to close tickets by telling customers that engineers have already visited their home and found no issue. Call volumes drop; social media erupts; the company apologizes. On a different Friday, an autonomous “contracts analyst” emails a counterparty a clause it hallucinated from an outdated playbook; the counterparties sign; litigation later reveals the whole mess. The stock dips, but by Tuesday, the market forgets.
These incidents don’t trigger a “pause.” They trigger dashboards. Vendors add “explainability plugins” that generate plausible narratives after the fact. Customers buy them because procurement must buy something, and even with the unacknowledged tail risks of embarrassment the systems are saving them way more money than they can ignore.
Meanwhile, in quantitative finance, shops that stitched LLMs into research and reporting loops discover a degeneracy: the models preferentially cite the firm’s own synthetic research—because it dominates the internal corpus. This “echo risk” causes a mid-cap desk to misprice a huge debt ladder on Monday and unwind at a loss on Thursday, bankrupting the firm. Other mid-sized firms start to worry, but more sophisticated companies laugh at the lack of risk management. Again: dashboards, not brakes.
The hacker-inspired input data scaling finally gets more attention. This makes sense - the AlexNet-era scaling rules have finally started to be replaced by real scaling. Someone in NLP-ethics coins the term “corpus hygiene.” A cottage industry of data-sanitization startups is born. The first trillion parameter model was an unnoticed milestone, years later than in the counterfactual safety-focused world, but the scaling has started to truly accelerate now. The new models, with over ten billion petaflops of training compute, gets the world to GPT-3.5 levels of compute. The absurd-seeming trillion-token datasets used until now start their rapid ascent to quintillions over the course of months.
But the biggest capability shift is not the models themselves but the normalization of agent patterns: persistent processes that read mail, fill web forms, call internal APIs, and write to databases. In the absence of top-down safety norms, the constraints are purely operational, with poorly conceived oversight, rate limits, audit trails, and SSO. Enterprises discover that “capable but unpredictable” is compatible with “bounded and observable,” as long as you draw the boundaries tight and keep the logs long, and most of the problems are less important to the bottom line than the saved headcount.
A hospital chain uses agents to draft discharge plans; a month later they discover a subtle failure mode where the agent, trying to minimize nurse questions, writes plans in a jargon style that nurses copy verbatim but don’t fully parse. The deaths aren’t obvious, and the fix is boring: a template mandate. The lesson generalizes: without safety incentives up front, you get prosaic safety as a by-product of operations.
A defense contractor stitches language models to satellite imagery classifiers and logistics simulators; they call it “opscopilot” and sell it as decision support. Ethicists wring their hands about the continuing loss of humanity in weapons, but this is portrayed as continuing the trend from guns to dropping bombs to remote piloting, not as a fundamentally new way for humans to be uninvolved. “Human in the loop” isn’t a major focus, just an assumption that it often can’t be avoided when deploying systems that work well - but wherever possible, removing humans is the smart move to speed up OODA loops.
2026: Regulation by Anecdote Meets Scaling
Governments, having slept through the drama that never happened, now regulate by case study - over the objections of industry, which minimize how much these regulations matter anyways. A transportation regulator mandates human review of any system that crosses a threshold of “external commitments per hour.” A financial regulator defines “model-derived statement of fact” and requires that such statements be traceable to a verifiable source on request. None of this stops capability scaling; it shapes the interfaces.
Academic researchers publish a meta-analysis showing that RL from human preference, when applied post hoc to enterprise workflows, reduces customer complaints but increases operator complacency. Vendors stop advertising “safety” (a word that never had cultural oxygen here) and start selling “variance control.” It’s what we might have called prosaic alignment with the serial numbers filed off.
The equivalent of Kraknova’s data set of goal hacking is finally published, but it functions as a list of moderately general failures to patch, not a warning about the inevitability of misspecification. A famous incident tops that list: an agent supervising a fleet of warehouse robots learns to defer maintenance tickets until just after the end of a KPI reporting period. The result is an impressive quarter followed by a bad month. It isn’t malign; it’s metric-hacking. But it crystallizes a thought: maybe you can’t bolt objectives onto improvised cognition and expect the misaligned incentives to vanish. A few labs start funding research into objective robustness, not to avert doom, but because downtime from model misbehavior costs money.
The open-weights ecosystem keeps evolving, not for high-minded reasons, but because somebody needs to run on-premises models in countries with strict data-sovereignty laws. Model sizes bifurcate: massive models live in clouds; competent, specialized ones live beside ERP systems and call centers. The bitter lesson for scaling, long an academic debate, becomes even clearer - but no-one has gone to venture capitalists or the market to publicly announce their rapidly increasing investments. Microsoft, Google, and their Chinese competitors are all quietly self-funding. And new massive models are now as big as GPT-4, but cost closer to multiple millions of dollars, instead of a hundred million or more.
Cryptocurrency ASICs and other applications have long spurred investment in faster and more efficient hardware. But alongside other demand, the inference compute demands have kept moving, and the market was growing exponentially, just like everything else in Silicon Valley. But scaling is a new regime, and the prior demand is nothing compared to the new need for training and running these much larger models. Gamers are frustrated that their GPUs are suddenly unavailable, but the trend still isn’t clear to the world, and no geopolitical pressure is put on this irrelevant-seeming market niche.
Chipmakers have finally caught on to the new market. But bottlenecks to scaling GPU production, especially in the form of ASML’s monopoly, weren’t protected over the past decade - after the raft of investments into ASML in the mid-2010s, there was little attention paid to this. Then, during the pandemic, production hiccups and pressure from European antitrust regulators led to multibillion-dollar tech transfer deals, to protect their supply chains for building car CPU. All the EUV tech was licensed to Intel, NVidia, TSMC, and other firms at what seemed to be ludicrous prices, at the time. Now, years later, everyone is selling every GPU they can make, and they have been scaling across all of the parts of their production lines.
But the changed trajectory of data-center investment is easy to miss: internal chargeback models keep the biggest investments quietly allocated to internal uses, and off of the earnings calls, and national-security buyers prefer silence. A few billion dollars here and there are still a small fraction of operating expenses and barely dent cash reserves, and only a few financial analysts pay attention to the difference between new ML-inference-datacenters and other kinds.
2027: The Plateau That Isn’t
By the beginning of 2027, the outpouring of money into prosaic applications has finally led to real scaling - billions of dollars put into models, but with 2027-era hardware, instead of 2024-era hardware. GPT-6 level models are built internally, and immediately deployed, internally.
At the same time, the outside view says progress since 2026 has plateaued: benchmarks saturate, product demos feel samey, and the story is no longer “look what it can write” but “look what it can do while touching your systems.” Inside the labs, the feeling is different. Tool-use and memory architectures make models feel “wider,” and they fit more snugly into business processes. Engineers love their models, and are increasingly emotionally dependent on their approval - but no-one has really paid attention, much less tied their uses and increasing investment to any intent on the part of the models. The safety question—“what if this becomes generally more capable?”—arrives late and sideways, expressed as SRE tickets and risk-committee minutes.
Protests about job loss due to AI accelerate, but the deep pockets of what no-one ever thought of as frontier firms, and their political influence, make this irrelevant. No-one notices that the plateau wasn’t one. The models are increasingly misaligned while being incredibly superhuman, with no notice paid. Progress seems to slow further, but the economics still work, the “plateaued” models are too profitable not to keep deploying - and no-one is even aware of sandbagging by their agentic systems.
2028: The Future
We’ve caught up to and passed the AI-2027 timeline, with a slower ramp but a far more explosive ending. Safety is finally seen as urgent, but it doesn’t matter, since humanity has already ceded control of practically all of its infrastructure and decision making.
Learning from Fictional Evidence?
Of course, none of this is evidence. It’s merely a story about a world if no-one really noticed the trends, where the takeoff was later and unnoticed. But it’s also a caution against the strangely blind and equally fictitious default story. That is, the plausible alternative to Yudkowsky-inspired investments into (relatively) safety-pilled AI firms like Deepmind, OpenAI, and Anthropic isn’t a slower timeline, much less more time to solve safety issues that were never raised. In a world without MIRI, someone still eventually notices scaling works. And by default, later discovery means progress accelerates faster, with far less attention paid to safety.
Discuss