Published on January 22, 2026 12:18 AM GMT
What do Solomonoff induction, transformers, and the human genome all have in common? They perform well in universes where the laws of physics are invariant across time.
That might sound trivial, but regularity is actually a pretty distinctive property of our little corner of the multiverse. After all, for any universe where the strength of the fundamental forces stays constant across time, it’s easy to imagine others where things like the strength of gravity or magnetism are fluctuating every second. The fluctuations themselves might not even be predictable. They might instead be determined by a massively complex random number generator, which we have no hope of reverse-engineering.
Fortunately, we don’t live in a world lik…
Published on January 22, 2026 12:18 AM GMT
What do Solomonoff induction, transformers, and the human genome all have in common? They perform well in universes where the laws of physics are invariant across time.
That might sound trivial, but regularity is actually a pretty distinctive property of our little corner of the multiverse. After all, for any universe where the strength of the fundamental forces stays constant across time, it’s easy to imagine others where things like the strength of gravity or magnetism are fluctuating every second. The fluctuations themselves might not even be predictable. They might instead be determined by a massively complex random number generator, which we have no hope of reverse-engineering.
Fortunately, we don’t live in a world like that. We live in a world where the laws that govern the future are the same as those that governed the past. We live in a world that runs like clockwork.
This is a great boon to any systems learning to predict the future from our world’s past behavior. For instance, consider the human genome, as it mutates and gets transmitted from parent to child, evolving across generations. Insofar as this process “works”, it works by endowing children with genes that were effective during their parents’ lifetime, on the implicit assumption that what worked for the parents should also work for the children.
And this assumption tends to work out pretty well for evolved organisms, due to the deeply regular causal structure of the universe. It really is the case that, if a cell encounters a given situation time and time again, the same response is likely to be appropriate, time and time again. This is partly because, modulo some quantum randomness, situations with identical initial conditions will always play out the same way under our laws of physics.
We can use similar logic to explain the utility of Solomonoff induction. For those unfamiliar, Solomonoff induction is a formal epistemology that assigns higher probabilities to universes with simpler underlying laws of physics (i.e., ones that can be specified with shorter computer programs). Intuitively, this appeals to the parts of us that respect simplicity biases, such as Occam’s razor. But what’s the deep reason the Solomonoff prior would be effective at discovering truths about our universe?
Well, part of it is that under Solomonoff induction, universes with simple, constant laws of physics ought to receive relatively high prior probabilities. If you only need to specify the laws (e.g. “bodies attract others proportional to their mass”) once, and repeatedly apply the laws of physics in a loop, you need a relatively small number of bits to do the job. If you want to say the laws of physics behave like ours up until time T, but a different way after time T, that likely requires more bits to pin down in program-space.
(A significant exception would be programs that procedurally generate new laws of physics, rather than encoding each set of laws explicitly. But at least Solomonoff cuts probability to worlds which explicitly encode “After time T, swap from precomputed-physics-1 to precomputed-physics-2.”)
In this way, Solomonoff, like the process of natural selection, makes the implicit assumption that the laws that explain the past are relatively more likely to predict the future, compared to the future being bound by totally unrelated laws. It treats “universes where physics are invariant across time” as simpler, relative to its chosen metric of simplicity, and therefore tends to assign them higher probability.
So now we have two examples of systems that assume the laws that generated the past will also hold true in the future: genomes, where mutations that did well in the past get passed onto future generations, and Solomonoff inductors, which probabilistically penalize worlds where the laws of physics suddenly change, because they’re more complex. Now let’s consider the third example of a learning agent with an inductive bias of this kind: transformers.
Transformers, and really deep learning systems in general, learn to behave by modeling their training data. The training data says, “Given this input, you should have produced output X”. The model then updates in a way that, when it sees that input in the future, it’s more likely to respond with output X. Fundamentally, this is an implementation of the assumption that behavior that would have been adaptive in the past will also be adaptive in the future. It’s built into the very foundations of the training process.
(Deep learning systems also manage to generalize, observing training examples and inferring correct behavior on unobserved test examples. The mechanics of how this works are much more poorly understood, and probably features some fascinating inductive biases beyond merely assuming the future will reflect the past. Obviously, I don’t understand exactly how this works.)
Transformers go on to use the heuristics they learned across training over and over again, throughout deployment. Again, the reason this works is that, in situations where the model is prompted the same way, the regularity of our laws of physics make it relatively likely that it ought to respond the same way, at least assuming its training objective hasn’t suddenly changed to reward it differently. It all comes back to assuming induction is workable in our universe.
In total, the three examples I’ve given can weave together into an interesting story about the origins of humanity’s epistemic habits. As David Hume pointed out, there’s no logical reason that induction has to work. The future could, in principle, devolve into noise at any moment. However, because our universe historically hasn’t worked like this in practice, natural selection was able to get off the ground, as organisms that were fit in one generation would be similarly fit in the next.
Eventually, natural selection went on to take advantage of this assumption in a second way, in the form of the learning algorithm it encoded in its blueprints for brain(-like system)s. From this point on, it’s not just that the genome would “learn” from the successes of past organisms, via natural selection. It’s that organisms themselves would learn from their own past experiences. By the power of strategies like predictive and reinforcement learning, organisms began to assume their pasts could guide their future behavior as well.
These creatures went on to build scientific theories of the universe, and noticed that in the grand scheme of things, they seemed relatively simple. Newton’s laws of motion, and even later developments like the Schrodinger equation, had the remarkable property of being useful for predicting the future, regardless of which moment in time you’re applying them from. In that sense, these laws were simple! You didn’t need to periodically learn new predictive theories of motion. The old ones appeared to keep holding forever.
This led to humans drawing the lesson that, in some ambiguous sense, simple scientific theories tended to perform better. This was termed Occam’s razor. Then, centuries down the line, Ray Solomonoff came along, and decided to try to formalize this notion of simplicity bias. He came up with Solomonoff induction. And although he framed it as a universal theory of inference, in reality it was based on contingent inductive biases, naturally selected into the human genome, because our universe miraculously has the properties required for simplicity bias to be viable.
Still more decades down the line, humans eventually managed to formalize some of these biases in a way that better resembles their biological implementation, compared to Solomonoff induction. We created artificial neural networks, and in doing so exploited the very same regularity properties that allowed us to evolve, and currently allow us to learn from our within-lifetime experiences, in the first place.
Human learning, natural selection, gradient descent, and even Solomonoff induction therefore ultimately share the same justification: They seem to have worked historically in our universe, and have therefore been selected for in intelligent minds. There’s no definitive solution to Hume’s problem of induction, but there is a pragmatic one. The regularity of our world gave rise to evolution, which gave rise to us, who then gave rise to Solomonoff induction and transformers alike.
An orderly world breeding minds that thrive within an orderly world.
Discuss