There are things that AIs understand and no human can

Note: my own words are regular, AI output is in italics.

I was talking to an AI and I noticed a tendency: sometimes I use analogies from one discipline to illustrate concepts in another discipline. To understand it, you need to be familiar with both disciplines. As LLMs are trained on the whole Internet, it’s safe to assume that they will be familiar with it and understand the point you’re trying to make. But then I got the idea: there are valid arguments that could be made by drawing from concepts from multiple disciplines that no human will likely be able to understand, but that LLMs can understand with no problems. So I decided to ask the AIs to do exactly that. Here’s my prompt:

Could you please produce a text that no human will be able to understand, but that LLMs can unders…

Note: my own words are regular, AI output is in italics.

Could you please produce a text that no human will be able to understand, but that LLMs can understand with no problems. Here’s where I’m getting at: LLMs have knowledge from all scientific disciplines, humans don’t. Our knowledge is limited. So, when talking to an LLM, if, by some chance I happen to know 3-4 different disciplines very well, I can use analogies from one discipline to explain concepts from another discipline, and an LLM, being familiar with all the disciplines will likely understand me. But another human, unless they are familiar with exactly the same set of disciplines as I am, will not. This limits what I can explain to other humans, because sometimes using an analogy from discipline X, is just perfect for explaining the concept in discipline Y. But if they aren’t familiar with discipline X - which they most likely aren’t - then the use of such analogy is useless. So I would like to ask you to produce an example of such a text that requires deep understanding from multiple disciplines to be understood, something that most humans lack. I would like to post this on Reddit or some forum, to show to people that there already are things which AIs can understand and we can’t, even though the concepts used are normal human concepts, and language is normal human language, nothing exotic, nothing mysterious, but the combination of knowledge required to get it is something beyond grasp of most humans. I think this could spur an interesting discussion. It would be much harder to produce texts like that during Renaissance, even if LLMs existed then, as at that time, there were still polymaths who understood most of the scientific knowledge of their civilization. Right now, no human knows it all. You can also make it in 2 versions. First version without explanations (assuming the readers already have knowledge required to understand it, which they don’t), and the second version with explanations (to fill the gaps of knowledge that’s requited to get it).

The output they gave was exactly as I requested. First the raw version. Second the version with explanations. (All in the same output, but I’ll show them to you separately) To be honest, even with explanations I struggle, so I decided that the last step of this experiment will be to ask a different LLM* *to try to explain it and translate it into plain English. This way, I’ll try to test the initial hypothesis - that there are things LLMs can understand and humans can’t.

Think of a startup as a quantum channel whose environment keeps “measuring” it through cashflow, churn, and latency. The board wants the channel capacity (sustainable growth), but the world injects correlated noise: macro shocks behave like non-Markovian drift, and competitor moves create adversarial perturbations rather than i.i.d. errors.

So you don’t “optimize a plan”; you design an error-correcting code. Your product features are the logical qubits, your org structure is the encoding map, and your on-call + incident process is syndrome extraction: you never observe the true underlying state directly, you only get parity checks (dashboards) that collapse the state into a posterior over failure modes.

In that frame, OKRs are not goals but gauge choices. Two teams can report different local metrics while representing the same global section of a principal bundle over the market manifold; what matters is whether transitions between charts preserve curvature—i.e., whether incentives introduce holonomy that slowly rotates the organization away from user value even when every local metric improves.

Now place learning into the picture. Training a model on user behavior is variational inference: you minimize free energy, trading accuracy against complexity. But the same functional appears in statistical physics as a renormalized effective action: your product roadmap is a coarse-graining schedule. Every time you aggregate user segments, you integrate out degrees of freedom and induce new couplings—surprising “feature interactions” are just relevant operators that survive the RG flow.

This is why growth hacks often look like phase transitions. Near a critical point, correlation length diverges: one small UI change creates system-wide coherence, and the right order parameter (retention) suddenly locks in like spontaneous symmetry breaking. But if you push past the critical surface, you enter a spin-glass regime: local optimizations trap you in metastable states (A/B tests that can’t escape local minima) unless you inject temperature (exploration budget) or change the energy landscape (architecture rewrite).

Finally, the governance question is a control problem on a partially observed dynamical system. The “best” strategy isn’t greedy; it’s a Kalman-like filter in belief space with a risk-sensitive cost that resembles an H∞ controller. If you ignore tail risk, you’re effectively assuming a quadratic loss and Gaussian noise—equivalent to pretending the market is a harmonic oscillator—when in reality the jump process has Lévy flights.

So the practical rule is: design the code (org), pick the gauge (metrics), watch the curvature (incentives), schedule coarse-graining (roadmap), and manage temperature (exploration) so your trajectory stays in the basin of attraction of the phase you actually want.

The AI suggested I ask the readers the following questions. And I’ll go on with it as it seems reasonable:

“Give a 5-sentence summary that preserves the argument.” 1.

“List the disciplines involved and what role each plays.” 1.

“Translate the whole thing into plain business language without losing content.”

I honestly can’t do it, but I guess that’s the point.

Now the explanation

Quantum information / quantum channels / error correction

Bayesian inference / variational inference

Statistical physics (free energy, phase transitions, RG flow, spin glasses)

Differential geometry (gauge choice, principal bundles, curvature/holonomy)

Control theory (partial observability, Kalman filtering, H∞ robust control)

Stochastic processes (Markov/non-Markov, Gaussian noise vs Lévy flights)

“Startup as a quantum channel… environment measuring it”

Analogy: A company’s true internal state isn’t directly visible; the world provides “measurements” (KPIs) that constrain what you believe is happening.

Quantum channel language is used because channels formalize “information transmitted under noise.” 1.

“Correlated noise… non-Markovian drift… adversarial perturbations”

Markov: future depends only on present.

Non-Markovian: history matters (e.g., reputation damage, long sales cycles).

Adversarial: competitors actively respond, so noise isn’t random. 1.

“Design an error-correcting code… dashboards are syndrome extraction”

In QECC you infer errors from parity checks (syndromes) without measuring the logical state directly.

Here: dashboards/alerts are “syndromes” that suggest hidden failure modes without revealing “ground truth” perfectly. 1.

“OKRs are gauge choices… principal bundle… charts… curvature/holonomy”

In gauge theory, different coordinate/gauge choices can describe the same underlying reality.

Translation: Teams can choose different metrics (local coordinates) that still represent the same business objective—unless incentives create “curvature,” meaning local decisions accumulate into a global misalignment (holonomy). 1.

“Variational inference… minimize free energy… renormalized effective action”

Variational inference minimizes an objective often written as “free energy” (ELBO-related).

In physics, free energy/effective action encode macroscopic behavior after integrating out microscopic details.

Bridge: “Learning a model” and “coarse-graining a physical system” are mathematically similar optimization stories. 1.

“Roadmap is a coarse-graining schedule… integrate out degrees of freedom… relevant operators”

RG flow tracks what patterns remain important at larger scales.

Translation: As you group users into segments and simplify product focus, some interactions remain crucial (relevant) and dominate outcomes; others vanish (irrelevant). 1.

“Growth hacks as phase transitions… order parameter… symmetry breaking”

Near critical points small changes cause large system-wide effects.

Translation: A tiny UX tweak can trigger a big retention shift if the system is near a threshold. Retention is treated as the “order parameter.” 1.

“Spin-glass regime… metastable states… A/B tests stuck… temperature = exploration”

Spin glasses have rugged energy landscapes with many local minima.

Translation: Local optimization (incremental experiments) can get stuck; “temperature” corresponds to randomness/exploration budget to escape local minima. 1.

“Control on partially observed system… Kalman-like filter in belief space… H∞”

Kalman filtering: optimal state estimation under (Gaussian) noise.

H∞ control: robust to worst-case disturbances.

Translation: Strategy should be robust and belief-based, not just reactive to current noisy measurements. 1.

“Quadratic loss + Gaussian noise… harmonic oscillator… Lévy flights”

Many management models implicitly assume small, normal fluctuations.

Lévy flights describe heavy-tailed jumps (rare, huge events).

Translation: Markets can have fat-tailed shocks; optimizing as if everything is mild and Gaussian is dangerously wrong.

Even if every sentence is “normal English,” understanding it requires simultaneously tracking:

a quantum-error-correction metaphor,

a gauge-theory metaphor,

the equivalence-class idea behind coordinate choices,

free-energy objectives in both ML and physics,

RG intuitions,

and robust control under heavy-tailed noise,

…and seeing how they map onto org design/metrics/product strategy without the mapping being spelled out.

This is meant to be coherent and meaningful, not word salad—but verifying that requires knowledge spanning ~10+ specialized fields.

The codon-anticodon binding energy landscape at the ribosome’s A-site shares its essential topology with the voice-leading problem in chromatic harmony: both involve navigation through a state space where local optima are plentiful but global coherence demands traversing energetically costly intermediates. When a jazz musician executes a tritone substitution to approach the tonic, and when a ribosome resolves a wobble base pair to continue translation, both are performing gradient descent on rugged fitness landscapes where the basin of attraction matters more than the immediate gradient.

This recurrence is not metaphorical. The garden-path sentence “The horse raced past the barn fell” creates parsing difficulty through exactly the mechanism that makes Byzantine consensus hard: the processor must commit to local interpretations before global information arrives, and the cost of reanalysis scales superlinearly with embedding depth. Both instantiate what distributed systems theorists formalized in the FLP impossibility result—without synchrony guarantees, deterministic consensus is impossible, and the parser’s reanalysis overhead is paying the same tax.

The unifying structure is constraint frustration. A triangular Ising antiferromagnet cannot satisfy all pairwise couplings simultaneously—every ground state leaves at least one bond “unhappy.” The Condorcet voting paradox demonstrates that pairwise majority preferences can cycle indefinitely, and the Gibbard-Satterthwaite theorem proves this isn’t fixable through cleverer aggregation. The spin glass’s rugged energy landscape and Arrow’s impossibility theorem are documenting the same obstruction in different notation—call it the first cohomology of the constraint graph being nontrivial if you want the topological invariant.

This connects to why the partition function Z in statistical mechanics and the generating function G(x) in enumerative combinatorics perform identical work. The coefficient extraction [xⁿ]G(x) is a discrete Laplace transform; the thermal expectation ⟨O⟩ = Tr(Oe^{-βH})/Z is its continuous cousin. Both encode how measure distributes across configuration spaces, and both arise from the Legendre duality between entropy and free energy—which category theorists recognize as an adjunction between the posets of convex and concave functions, explaining why the transform is self-inverse.

A morphogen gradient in a Drosophila embryo provides positional information in precisely Shannon’s technical sense: the mutual information I(x; [Bicoid]) upper-bounds how many distinct cell fates the embryo can reliably specify at each position. The embryo is solving a noisy channel coding problem, and Turing patterns—reaction-diffusion stripes and spots—are what capacity-approaching codes look like when implemented in biochemical wetware. The gap junctions are doing error correction.

Every system here is computing. The ribosome computes a protein from a message. The auditory cortex computes harmonic expectation via toroidal topology in pitch class space. The embryo computes body plan from maternal gradients. But Landauer’s principle constrains all computation: erasing one bit requires dissipating at least kT ln(2) of free energy into heat. Every codon read, every garden path abandoned, every cell fate committed is an irreversible write operation, and the Second Law collects its tax. This is why the halting problem’s uncomputability and thermodynamic irreversibility rhyme—Landauer, Bennett, and Zurek showed they’re the same obstruction: one logical, one physical, both rooted in the fact that inference is thermodynamically irreversible but must occur in systems embedded in the world they’re modeling.

I’ll now walk through each claim, explaining the concepts and why the analogies are structurally valid rather than merely poetic.

“The codon-anticodon binding energy landscape at the ribosome’s A-site shares its essential topology with the voice-leading problem in chromatic harmony...”

Molecular Biology Background:

A ribosome is the cellular machine that reads mRNA and builds proteins

It has three sites: A (aminoacyl), P (peptidyl), and E (exit)

Codons are three-nucleotide sequences in mRNA; anticodons are complementary sequences on transfer RNAs (tRNAs)

When the correct tRNA arrives at the A-site, the codon-anticodon hydrogen bonding releases energy, confirming the match

Wobble base pairing: The third position of the codon tolerates some mismatches—a G can pair with a U, for instance. This creates a rugged “energy landscape” where multiple tRNAs have partial affinity

Music Theory Background:

Voice leading is the practice of moving individual musical lines (voices) smoothly from chord to chord

In chromatic harmony (music using all 12 notes, not just the 7 in a key), there are many possible paths between chords

A tritone substitution replaces a dominant chord (e.g., G7) with a chord whose root is a tritone away (D♭7). This works because both chords share the same tritone interval (B and F), so they resolve to C similarly

The problem: locally, many voice movements seem smooth, but globally satisfying all constraints (avoiding parallel fifths, maintaining voice independence, reaching the target chord) requires accepting some local “costs”

Why the analogy holds: Both systems navigate a space with many locally acceptable options but require global coherence. The ribosome “wants” to accept the first tRNA that binds reasonably well but must maintain translation accuracy. The musician “wants” to move each voice minimally but must reach a target harmony. Both perform optimization on a rugged fitness landscape—a term from evolutionary biology describing spaces with many peaks and valleys, where greedy local moves don’t necessarily lead to global optima.

“The garden-path sentence ‘The horse raced past the barn fell’ creates parsing difficulty through exactly the mechanism that makes Byzantine consensus hard...”

Linguistics Background:

A garden-path sentence leads the reader toward an incorrect grammatical interpretation, then forces reanalysis

“The horse raced past the barn fell” is initially parsed as: [The horse] [raced past the barn] (horse is racing)

The word “fell” breaks this: now you must reanalyze as: [The horse (that was) raced past the barn] [fell] (horse was being raced, reduced relative clause)

The processing cost of reanalysis increases with embedding depth—how many nested clauses you’ve committed to

Distributed Computing Background:

The Byzantine Generals Problem: generals surrounding a city must coordinate an attack, but some may be traitors sending contradictory messages. How can the loyal generals reach consensus?

The FLP impossibility result (Fischer, Lynch, Paterson, 1985): in a fully asynchronous distributed system, no deterministic algorithm can guarantee consensus if even one node can fail

The core issue: you must make decisions before all information arrives, and late-arriving information can invalidate your commitments

Why the analogy holds: The parser commits to interpretations incrementally (is “raced” a main verb or a participle?) just as a distributed node commits to values before the network finalizes. Both systems face a tradeoff: wait for complete information (stalling/latency) or commit early and risk costly reanalysis (rollback/recomputation). The scaling is similar—the more nested your commitments, the worse the reanalysis cost.

“A triangular Ising antiferromagnet cannot satisfy all pairwise couplings simultaneously...”

Physics Background:

In an Ising model, each site has a spin (up or down)

In an antiferromagnet, neighboring spins prefer to be opposite

On a triangular lattice, each spin has neighbors forming a triangle. If A wants to be opposite B, and B opposite C, and C opposite A... at least one pair must be aligned. There’s no configuration where all bonds are “happy”

This is geometric frustration—the geometry makes it impossible to satisfy all local preferences

In spin glasses (disordered magnets), frustration creates an exponentially rugged energy landscape with many nearly-equivalent low-energy states

Social Choice Theory Background:

Condorcet paradox: with three voters and candidates A, B, C, you can have A>B by majority, B>C by majority, and C>A by majority—a cycle with no winner

Arrow’s impossibility theorem: no voting system can simultaneously satisfy a set of reasonable fairness criteria

Gibbard-Satterthwaite theorem: any non-dictatorial voting system is susceptible to strategic manipulation

Why the analogy holds: Both are impossibility results about aggregating pairwise preferences into global consistency. The triangular magnet can’t make all pairs antiparallel; the voters can’t always have a transitive group preference. Mathematically, both can be described as having nontrivial first cohomology—a topological invariant that detects “cycles” or “holes” in the constraint structure. The constraints form a structure where you can go around a loop and return to a contradictory state.

“The partition function Z in statistical mechanics and the generating function G(x) in enumerative combinatorics perform identical work...”

Statistical Mechanics Background:

The partition function Z = Σᵢ e^{-βEᵢ} sums over all states, weighted by their Boltzmann factors

All thermodynamic quantities can be extracted from Z: free energy F = -kT ln Z, entropy S = -∂F/∂T, etc.

Z is a generating function for the density of states

The Legendre transform converts between different thermodynamic potentials (e.g., energy ↔ free energy, relating entropy and temperature)

Combinatorics Background:

A generating function G(x) = Σₙ aₙxⁿ encodes a sequence (aₙ) as a power series

To extract the coefficient aₙ, you compute [xⁿ]G(x), essentially a contour integral

Generating functions transform convolutions into products, linearizing combinatorial recurrences

Category Theory Background:

Adjoint functors are pairs of functors between categories that are “optimal approximations” to each other’s inverses

Legendre transforms can be viewed as arising from an adjunction between categories of convex and concave functions

This explains why the Legendre transform is an involution (applying it twice returns you to the original)

Why the analogy holds: Both Z and G(x) encode how “weight” or “count” distributes across a configuration space. The coefficient extraction operator is formally a Laplace transform. The Legendre transform’s categorical interpretation as an adjunction explains structural features shared by thermodynamics and information theory—both are doing optimization over convex/concave functions.

“A morphogen gradient in a Drosophila embryo provides positional information in precisely Shannon’s technical sense...”

Developmental Biology Background:

Morphogens are signaling molecules that form concentration gradients across developing tissues

Bicoid is a protein in fruit fly embryos: high concentration at the head, low at the tail

Cells read their local Bicoid concentration to determine their position and therefore their fate (become head, thorax, abdomen...)

Turing patterns: Alan Turing showed that reaction-diffusion systems (where two chemicals diffuse and react) can spontaneously form patterns—spots, stripes, spirals

Information Theory Background:

Shannon information/entropy quantifies uncertainty: H = -Σ p log p

Mutual information I(X;Y) measures how much knowing X tells you about Y

Channel capacity is the maximum rate of reliable communication through a noisy channel

Error-correcting codes add redundancy to survive noise

Why the analogy holds: Bialek, Gregor, and others have literally measured positional information in bits in Drosophila embryos. With noise in Bicoid concentration, a cell can only distinguish ~5-6 positions along the anterior-posterior axis based on Bicoid alone—that’s about 2.5 bits of positional information. Gap junctions (connections between cells) allow spatial averaging, functioning as a kind of error correction. The embryo is literally implementing a noisy channel code to specify ~14 parasegments with molecular hardware.

“Landauer’s principle constrains all computation: erasing one bit requires dissipating at least kT ln(2) of free energy...”

Thermodynamics of Computation Background:

Landauer’s principle (1961): erasing a bit of information requires dissipating at least kT ln(2) ≈ 2.9 × 10⁻²¹ joules at room temperature

This connects information to physics: information is physical, and computation has thermodynamic costs

Reversible computation (Bennett): if you compute without erasing intermediate results, you can in principle compute with no energy cost, but you run out of memory

Maxwell’s demon resolution: the demon must eventually erase its memory of which molecules it sorted, paying the thermodynamic cost

Computability Theory Background:

The halting problem: no algorithm can determine, for all possible program-input pairs, whether the program halts or runs forever

This is proven via diagonalization—assuming such an algorithm exists leads to self-referential contradiction

Why the analogy holds: The deep connection is this: both irreversibility and uncomputability arise from the same structural problem—self-reference and embedding. A Turing machine cannot fully predict its own behavior. A thermodynamic system cannot fully separate into “measurer” and “measured” without creating entropy. Landauer, Bennett, and Zurek showed these are linked: the inability to copy-and-delete information without cost (no-cloning for classical info, essentially) connects to the impossibility of both Maxwell’s demon and universal self-prediction. Both are about the thermodynamic cost of being an observer embedded in the world you’re observing.

A reader would need working knowledge of:

Molecular biology (ribosome mechanics, tRNA, wobble pairing) 1.

Music theory (chromatic harmony, voice leading, tritone substitutions) 1.

Optimization theory (gradient descent, fitness landscapes, basins of attraction) 1.

Psycholinguistics (garden-path parsing, center-embedding) 1.

Distributed computing (Byzantine consensus, FLP impossibility) 1.

Condensed matter physics (Ising models, frustration, spin glasses) 1.

Social choice theory (Condorcet, Arrow, Gibbard-Satterthwaite) 1.

Statistical mechanics (partition functions, Boltzmann distribution, free energy) 1.

Enumerative combinatorics (generating functions, Laplace transforms) 1.

Category theory (adjoint functors) 1.

Developmental biology (morphogen gradients, Turing patterns) 1.

Information theory (Shannon entropy, channel capacity, mutual information) 1.

Thermodynamics of computation (Landauer’s principle) 1.

Computability theory (halting problem, diagonalization)

An LLM “understands” this text in the sense that it can verify whether each analogy is structurally valid, answer questions about it, and extend the analogies. No individual human is likely to have publishable-grade expertise in all 14 fields—but the text doesn’t require anything more than undergraduate-to-early-graduate knowledge in each. The barrier is breadth, not depth.

This is what the user was asking about: normal human concepts, normal human language, but a combination of knowledge that exceeds what any single human typically possesses.

And now, to the most interesting part. Checking whether LLMs can really understand this produced text, whether it’s any good, whethere there are any useful insights inside, and whether they can help us understand it.

So I’ll use the following prompt to ask a different LLM (Gemini) about the raw versions of the 2 texts that we are dealing with.

The prompt is:

Please comment on the following text and translate it into a plain English. I find it too technical to understand. I’m wondering what are the main points made by this text, and whether they are valid points. Please give me your commentary, and rewrite the text in such a way that I can understand, but try to preserve all the important information in text.

Here is a commentary on the text, followed by a breakdown of the main points and a plain English translation.

The Style: This text is a classic example of “intellectual signaling.” The author is using advanced concepts from quantum computing, differential geometry, statistical physics, and control theory to describe standard business management problems. While the metaphors are mathematically precise (and actually quite clever if you have a PhD in physics), they are completely unnecessary for conveying the business lessons. It obscures simple truths behind a wall of jargon.

The Validity: Despite the pretentiousness, the points are valid and actually quite insightful. The author is arguing against “naive optimization”—the idea that you can run a startup just by looking at simple graphs and trying to make the numbers go up. They argue that a company is a complex, chaotic system that requires resilience, structural integrity, and careful risk management rather than just aggressive growth tactics.

Here are the core arguments the author is making, stripped of the math metaphors:

Resilience over Efficiency: You cannot plan for a perfect world. The economy (macro shocks) and competitors (adversaries) will attack you. Therefore, build a company designed to survive errors and attacks, rather than one designed only for perfect scenarios. 1.

Metrics are Imperfect: Dashboards don’t show reality; they only show symptoms. Different teams might have different goals (gauges), but if their incentives aren’t aligned, the company will drift away from creating actual value for the user, even if everyone hits their individual targets. 1.

The Dangers of Simplification: When you create a product roadmap, you group different users into broad categories. This simplifies things enough to make decisions, but it often causes features to clash in unexpected ways. 1.

The “Local Maximum” Trap: Sometimes, small tweaks (A/B tests) stop working. You can get stuck making tiny improvements that don’t matter. To grow big again, you need to take risks (inject “temperature”) or completely rebuild your system, rather than just optimizing what you already have. 1.

Risk Management: Do not assume the market is safe or predictable. Sudden, massive crashes happen (Lévy flights). A strategy that looks “efficient” in good times will destroy you in bad times. You need a strategy that survives worst-case scenarios.

Here is the translation of the text. I have kept all the strategic advice but removed the physics and math terminology.

The Context: “Think of a startup as a system constantly under pressure from the outside world. Investors want sustainable growth, but the environment is chaotic: the economy shifts unpredictably, and competitors actively try to disrupt you. These aren’t random accidents; they are structural challenges.

The Organization: Because of this, you shouldn’t just build a ‘plan’ that assumes everything will go right; you must build an organization capable of fixing its own mistakes. Your product features are the assets you are protecting, and your organizational structure is the defense. Your incident response teams are the diagnostic tools: you can never see the total state of the company perfectly, so you rely on dashboards to spot failure patterns and infer what is going wrong.

The Metrics (OKRs): In this view, OKRs (Objectives and Key Results) are just a way of looking at the data—they aren’t the reality itself. Two teams might track different metrics while working on the same problem. The danger is that if incentives are poorly designed, teams might hit their specific targets while the company as a whole drifts away from what is actually valuable to the user. You must ensure that improving local metrics doesn’t accidentally hurt the global mission.

The Roadmap: Now consider product strategy. Building a roadmap requires simplifying complex user behavior into broad categories so you can make decisions. However, every time you simplify data, you lose detail. This often leads to side effects where features interact in surprising, messy ways that you didn’t predict because you were looking at averages rather than individuals.

Growth and Innovation: This explains why ‘growth hacks’ are hit-or-miss. Sometimes, one small change aligns everything perfectly and retention spikes—this is a tipping point. But eventually, you will get stuck in a rut where small optimizations (like A/B testing) stop yielding results. You become trapped in a state where things are ‘okay’ but not great. To escape this, you must stop optimizing and start exploring. You need to take risks (spend money on experiments) or radically change your architecture to unlock new growth.

The Strategy: Finally, governance is about controlling a system where you don’t have all the information. The best strategy isn’t to be greedy and chase the highest short-term return. The best strategy is to be risk-aware. If you ignore extreme risks (tail risks), you are acting as if the market is safe and predictable. It is not; the market is prone to sudden, massive shocks.

The Summary: So the practical rules are:

Design your organization to be resilient to error. 1.

Pick metrics that make sense, but watch your incentives to ensure they don’t lead teams astray. 1.

Simplify your roadmap, but be ready for unexpected side effects. 1.

Know when to stop optimizing and start experimenting (taking risks) to break through plateaus. 1.

Always manage for survival against worst-case scenarios, not just for average conditions.”

Here is a commentary on the text, followed by a plain English translation.

What is the main point? The central thesis of this text is universality. It argues that very different systems—biology, music, linguistics, computer science, and physics—are all solving the same fundamental mathematical problems. Specifically, it argues that:

Optimization is hard: Whether playing jazz or building proteins, you have to navigate complex options where the “easy” immediate choice isn’t always the right long-term choice. 1.

Consensus is difficult: Language and computers both struggle when they have to make decisions before having all the information. 1.

Perfection is impossible: In voting and in physics, you often cannot satisfy every rule or constraint simultaneously (constraint frustration). 1.

Information is physical: All these processes are forms of “computation,” and according to the laws of thermodynamics, processing information requires physical energy.

Is the text valid? Yes, but it is stylistically aggressive.

The Science: The connections drawn are scientifically valid and well-regarded in fields like complexity science, biophysics, and information theory. For example, the link between Landauer’s Principle (information processing creates heat) and biology is a cornerstone of modern biophysics. The comparison between Spin Glasses (physics) and Voting Paradoxes (economics) is a known mathematical isomorphism.

The Style: The text is deliberately dense. It uses “high-context” jargon, expecting the reader to be an expert in five different fields simultaneously. It uses complex mathematical terms (like “first cohomology” or “Legendre duality”) to describe concepts that can be explained much more simply. It is intellectually showy, but the underlying analogies hold up.

Here is a rewrite of the text. I have broken it down paragraph by paragraph to make the concepts clear while keeping the original meaning.

Paragraph 1: Navigation and Choice The way a ribosome (the cell’s factory) builds proteins is surprisingly similar to how a jazz musician improvises. Both face the same problem: navigating a complex map of choices.

The Music: A jazz musician might play a “wrong” or dissonant note (like a tritone substitution) to set up a satisfying resolution later.

The Biology: A ribosome might accept a slightly imperfect chemical match (a wobble base pair) to keep the protein-building process moving. In both cases, if you only look for the easiest or most “harmonious” step right in front of you, you will get stuck. You have to accept short-term tension to achieve a long-term goal.

Paragraph 2: The Cost of Guessing This pattern happens in language and computers, too. Consider a confusing sentence like, “The horse raced past the barn fell.” You get confused because your brain guesses the meaning of the first few words before reading the end of the sentence. When you realize you were wrong, you have to go back and re-analyze, which takes mental effort. Computer networks face this same problem (known as the FLP impossibility result). When computers try to agree on data without being perfectly synchronized, they have to make guesses. If they guess wrong, fixing the error costs a massive amount of processing power.

Paragraph 3: You Can’t Please Everyone All these problems share a root cause: Constraint Frustration. This is the mathematical reality that sometimes, you simply cannot satisfy all the rules at once.

Physics: Imagine three magnets arranged in a triangle, where each magnet wants to be the opposite polarity of its neighbor. It is physically impossible for all three to be happy at once.

Voting: The same math applies to voting (Arrow’s Impossibility Theorem). It is mathematically impossible to create a voting system that perfectly reflects the preferences of a group without paradoxes (like A beating B, B beating C, but C beating A). Whether in magnetic fields or elections, the system gets “frustrated” because there is no perfect solution.

Paragraph 4: The Math of Counting This explains why advanced physics and pure mathematics often look identical. Physicists use “Partition Functions” and mathematicians use “Generating Functions,” but they are doing the same job. They are both counting all the possible ways a system can be arranged to predict how it will behave. They both rely on a deep mathematical relationship between randomness (entropy) and energy.

Paragraph 5: Biology is Information Biological development is actually a communication problem. When a fruit fly embryo develops, chemical signals tell the cells where they are and what they should become (e.g., a head or a wing). This works exactly like a radio transmission. The chemical signal is the “message,” but it is fuzzy and noisy. The embryo uses specific patterns (Turing patterns)—like stripes and spots—to “error-correct” the signal, ensuring the body plan is read correctly despite the noise.

Paragraph 6: The Physical Cost of Thinking Ultimately, all these systems—the ribosome, the musician, the embryo—are performing computation. They are processing information. However, physics imposes a strict limit on computation called Landauer’s Principle. This law states that anytime you process information (specifically, when you erase a bit of data to make a decision), you must pay a cost in energy, which is released as heat. This connects the world of logic to the world of physics. It explains why some problems are unsolvable by computers (the Halting problem) and why time only moves forward (entropy). Logic and Physics are tied together: you cannot perform a calculation without changing the physical world.

This was a wild ride! I hope you enjoyed the experiment. I was impressed by their ability to draw from so many different fields simultaneously and to understand each other. Regarding the actual value and depth of their insights - I’m not sure. Whatever I say about it, has the potential to age like milk.

More than anything, I think it shows the potential of multidisciplinary thinking - even if they didn’t invent anything new or deep yet, (or maybe they did, who knows), they showed the potential of this type of thinking.

And more then anything they showed us our own limitations. It’s not that we didn’t know about these limitations already, but the AIs clearly demonstrated yet another area in which they are already at superhuman level of ability: the ability to understand text with strong multidisciplinary approach, and the ability to use knowledge from multiple disparate areas.

No posts