Pre, Mid, Post-Training Way of Life

*"For here there is no place / that does not see you. You must change your life."

— Rilke’s Archaic Torso of Apollo

Building a large language model happens inthree main stages.

Pre-training processes trillions of tokens scraped from everywhere, taken in with little discrimination. The model learns to predict the next word from the last, and the deceptively simple task demands massive compute. Tens of thousands of GPUs for months, tolerating messy, uncurated data dredged from the internet’s sediment. Each layer of data becomes another layer of silt, building a multidimensional echo of the collected human experience. The accumulated text of human civilization fed forward without editorial judgment. Here, more …

*"For here there is no place / that does not see you. You must change your life."

— Rilke’s Archaic Torso of Apollo

Building a large language model happens inthree main stages.

Post-training is the inverse: the model now exists, and the question shifts from what it knows to what it becomes. Data quality becomes everything—reinforcement learning with human feedback,direct preference optimization,constitutional AI,rejection sampling with teacher committees. A few perfect examples outweigh a million mediocre ones. Quality is compression, excellence, exclusion. Compute footprint shrinks to a fraction of pre-training’s cost. The curatorial burden expands to absorb it. Someone has to decide what ‘good’ looks like.

Mid-training sits between these phases, though fewer practitioners name it directly yet. It is the industrialization of quality, the discovery that you can manufacture discernment at scale. The model is fed carefully constructed data: machine-generated text filtered by other machines trained to judge it, high-quality sources deliberately overrepresented, answers transformed into questions so the model learns to ask what it already knows how to solve. This is neither the indiscriminate hunger of pre-training nor the precious hand-curation of post-training. It is the assembly line where taste is mass-produced. The foundation is set, but values remain unwritten. In this liminal space, the model is shown, again and again, what ‘better’ looks like.

LLM metaphors have begun to infiltrate my everyday language. I’ll describe someone’s intellect as ‘high tokens per second,’ request a ‘context window’ before a catch-up, or laugh about ‘updating my weights’ after criticism. Like all metaphors, these shrink the wildness of people into neat, lossy packages. Yet the three training phases persist in my mind—a powerful lens, a rough taxonomy for not just how people think, but how they navigate the world.

But 2025 complicated this triptych. The industry began to behave as if there were a new major stage, longer and hungrier than the thin epochs of instruction-tuning we had grown used to. What changed was the discovery of a specificasymmetry, the realization that while it is difficult to solve a complex problem, it is trivial to recognize the solution. A Sudoku puzzle, a line of code, a proof. These are tasks with a hard, opaque front and a transparent back.

We left behind the era of ‘taste’—that delicate, costly consensus that resists scaling—and entered the age of the indisputable result. Once rewards became verifiable and objective, the gentle, exhausting labor of human judgment was no longer required. Now, the model simply grinds against an indifferent reality, unpersuadable and unmoved, until it collides with the truth.

We once spent everything on “eating the world,” a vast ingestion followed by the decorative sculpting of human preference. But the focus has shifted. Now, the model is left to grind against tasks that possess a fixed, indifferent clarity. It invents a private ladder. And it climbs without a break.

The pre-training mind is defined by a frantic, lateral hunger. It is driven by two things: volume of data and the brute force of compute. As a result, they read everything, take every meeting, and believe it’s all “a numbers game.” We call them “machines,” a term that has become less a slur and more a kind of secular canonization. Their work ethic is undeniable, yet there is something ghostly about their trajectory; they run with a ferocity that suggests they are less interested in the destination than in the pure, frictionless sensation of the running.

Their knowledge is vast and rhizomatic, a sprawling map with no capital. They speak on any topic with a fluency that is as impressive as it is ungrounded—a glittering surface with no detectable taproot. If you ask them what it is all for, the question strikes them as a category error. Why wouldn’t you do this? For them, the exhilaration of capacity is its own justification. They are the architects of the hive, comfortable managing armies toward a goal that remains conveniently abstract.

There is a sense of collectivism here, a philosophy that metabolizes difficulty into capability without ever pausing to ask what that capability serves. Understanding is not a requirement; coordination is the only sacrament. In this regime, belonging is the operating system. When you are part of the collective, purpose doesn’t need to be articulated; it can be assumed.

By conventional metrics, they have won. They have inhaled the earth. And it is important to realize that their victory is not a relic. Critics or tinkerers did not build the massive, humming data centers of 2025; they were willed into existence by the grit of the pre-training mind. They are the only ones capable of the “more is more” logistics required to pave the planet in silicon.

But 2025 introduced a new, chilling irony into their labor. They spent their lives building the cathedral of “scale,” assuming that the height of the ceiling would be the measure of their soul. Instead, they discovered that they had merely built a very large room for a very different guest.

Reasoning is indifferent to how much you consume; it cares only for the friction of hesitation. It is the model’s secret dialogue, testing and discarding ideas in the shadows before a single word comes out. For the pre-training mind, this is an existential crisis! They have become the system’s oxygen — essential, everywhere, and invisible to the very life they uphold. They are the pavement’s strength, gazing up at the passing cars, bewildered by a world they no longer know how to steer.

The post-training minds are rare, and they know it. They cultivate their scarcity as an identity. These are the people who read primary sources, who spend an afternoon at the Frick, who refuse meetings with devastating politeness. It isn’t just snobbery; it’s the terror that their attention, the only thing they truly own, is being diluted by the world’s noise. They care about things that feel like a nuisance to everyone else: the grain of a specific paper, the historical baggage of a typeface, the minute, agonizing space between two words that a machine would treat as identical.

Something dishonest sits at the center of their discernment. They rarely admit, even to themselves, that their refined taste was trained on a corpus they did not personally ingest. They inherited a foundation and called it “vision.” They stand on the roof of a house and critique the masonry of the basement, forgetting they didn’t carry a single brick. For them, every choice is a debt: the burden of making something worthy of the accumulation they rely on.

They see the shape of excellence so early and so vividly that the act of beginning becomes a betrayal. The gap between the vision and the execution is not a distance to be traveled, but a canyon that makes the first step look absurd. To do nothing begins to feel like a moral act—a refusal to further pollute the world with the very slop they have spent their lives learning to despise. They are the world’s most gifted editors, which is a polite way of saying they are its most paralyzed creators. Their constraint is not a lack of knowledge, but a profound failure of the will — the inability to forgive themselves for producing something that is merely good.

Yet, on the rare occasion they do produce, the work possesses a density that raw effort cannot mimic. Every choice feels load-bearing, a structural necessity. Here, sparseness is not a lack of resources but a form of compression—the way a poem is not a failed novel, but a reality where the air has been removed.

Their deepest intuition is a steady contempt for the “solved” answer. They suspect that any truth which can be fully captured by a metric is, by definition, no longer true; it has become a mere artifact of the test. They watch as others grow specialized spikes to cover the benchmarks, a frantic biological adaptation to a landscape of artificial incentives. To the post-trained, this is not intelligence, but a sophisticated form of mimicry. They understand that the more we optimize for the legible, the more we lose the scent of the real. For them, discernment is the art of standing in the rain and feeling the specific, unrepeatable cold. It is a commitment to the jagged, the unreached, and the stubbornly unmeasurable.

They dwell in the world’s undigitized residue, certain that the soul lives not in the signal, but in the stubborn, beautiful noise machines are designed to overlook. It is the last place a person can be found—shivering, unrepeatable, and whole.

The mid-training mind thrives on restless, kinetic empiricism. If the pre-trained mind is a library and the post-trained mind a gallery, the mid-trained mind is a flight simulator. They distrust any knowledge that cannot be proven in the moment. For them, theory is just another way to stall. Their education is a relentless, high-speed loop of trial and error—exhausting to the pre-trained, vulgar to the post-trained.

Gilbert Ryle distinguished between “knowing that” (propositional knowledge) and “knowing how” (procedural skill). Mid-training people live entirely in the latter. They don’t care that a bike stays upright through physics; they simply want to ride. They will fall seventeen times to get there, and they will never read the paper, even after they’ve mastered the balance. Schools struggle with them because they reward the pre-trained (retention) and the post-trained (judgment), but have no category for the student who fails physics but builds a working radio in the garage.

Mid-training’s power comes from manufacturing judgment at scale, synthetic data loops where machines generate attempts and other machines filter them, quality emerging from volume without the slow accumulation of human taste. The mid-training mind operates the same way: they iterate rapidly through generated possibilities, trusting the loop to surface something that works. They have learned to produce the appearance of discernment without ever developing the faculty itself.

Mid-training shares common ancestry with the rise of vibe coding. They are both expressions of the same metabolic shift: a move away from the grueling, upfront labor of ‘knowing’ toward a high-frequency, reactive ‘finding.’ You don’t plan your way to correctness; you gradient descent your way there. The work ceases to be a construction and becomes a series of attempts, each one slightly less broken than the last.

They move from boxes to canvases. A chat interface demands a finished thought; a spatial interface just asks where you want to put the pieces. As Karpathy suggested, these ‘dense reward environments’ are vital for modern cognitive workflows because they replace the rigid commitment of the sentence with the flexible logic of the map.

For the mid-training mind, the goal is no longer to reach a destination or produce a correct result, but simply to remain in the state of doing. They have traded the burden of the “why” for the frictionless high of the “next,” betting everything on the belief that as long as the kinetic energy is high enough, we keep going.

These characters come to life like the Karamazov brothers—each reminds me of someone I know intimately as a friend, coworker, or family member. We are all some unstable amalgamation of the three, the ratios shifting with the weather of our lives: with age, with humiliation, with the sudden, disorienting arrival of success.

Even as I write this, I realize I am engaged in my own interpretability task—a desperate attempt to classify the self. I wonder if this framework is just another sophisticated optimization, a way to be a more “efficient” observer of my own life. We tell ourselves that to be a writer, or an investor, is to be obsessed with new ways of seeing. But I’ve come to suspect that clarity has very little to do with the engine of the intellect. It is, instead, a matter of what you can metabolize without lying to yourself. It is the rare, quiet ability to hold your own impulses at arm’s length and finally choose them, rather than merely obey them.

We are all inhabited by these phases. Some people are possessed by the collective, disappearing into the warm comfort of the pre-trained mass. Some are possessed by the tyranny of taste, frozen in a post-training posture of permanent critique. Others are possessed by the frantic, kinetic ecstasy of motion, addicted to the mid-training high of ‘what’s next.’

I happened to be reading Dostoevsky’s *The Dream of a Ridiculous Man*while working on this essay, and it stopped me cold.

The story is simple. A man has decided that nothing matters. The world makes no difference to him, and he plans to kill himself. But that night, he falls asleep and dreams. He dreams of another earth—identical to ours but unfallen. The people there live in pure bliss. They know no cruelty, no jealousy, no need to prove anything. They love without suspicion. They die without fear. They have no science because they have no questions; their knowledge is not acquired but inhabited, like breath.

And he corrupts them. Simply by being among them, he introduces the lie. They fall. They learn shame, then cruelty, then science. They build temples to ideas they no longer believe. They invent justice to manage the wreckage.

What haunts me is not the fall, but how they respond to it:

“Granted we’re deceitful, wicked and unjust, we know that and weep for it, and we torment ourselves over it, and torture and punish ourselves perhaps even more than that merciful judge who will judge us and whose name we do not know. But we have science, and through it we shall again find the truth, but we shall now accept it consciously, knowledge is higher than feelings, the consciousness of life is higher than life. Science will give us wisdom, wisdom will discover laws, and knowledge of the laws of happiness is higher than happiness.”

This is the voice of the mid-trained mind at its most articulate and most damned. We know we are reward-hacking. We know the metrics do not capture reality. But we tell ourselves that knowing we are lost is a kind of wisdom. We believe that the consciousness of the problem is higher than solving it.

But the ridiculous man doesn’t out-argue them. When he wakes, something has changed in him that he cannot explain. He has not learned a new concept. He simply woke up carrying a different law inside him, one that made his former clarity look like a symptom of an illness. A dream rewrote the reward function. In that rewrite, all his cleverness became irrelevant.

“I saw the truth,” he says, “it’s not that my mind invented it, but I saw it, I saw it, and its living image filled my soul for all time.”

He goes out to preach what he saw. Everyone laughs at him. They call him mad. They say it was just a dream, a hallucination, a delusion. He knows they will laugh. He preaches anyway.

Aren’t we all that ridiculous man who wants to dance? We are ridiculous not because we’re foolish, but that we try to justify what is prior to justification. We want to prove logically that life is worth living, when the proof is always something smaller and more humiliating. It is a face, a kindness, or a moment of unguarded love.

The three phases may not be personality types. They are seasons of a single life. There is a season to ingest the world with vulgar appetite because you are building a foundation. There is a season to test yourself against reality because only friction reveals your actual shape. And there is a season to curate because attention is the soul’s only finite resource.

The deeper lesson of 2025 is that mid-training revealed a third path: you don’t need infinite data or perfect taste if you can teach machines to manufacture judgment. But the same infrastructure enables something darker. When the reward is verifiable, optimization can run longer than taste can tolerate. The model invents ladders. The human does too. These ladders can lead upward or they can lead nowhere. The real upgrade is learning to choose your reward function. It is not asking what you are capable of, but what you are becoming. It is not asking how to win, but what kind of winning would make you despise yourself.

And maybe this is what the ridiculous man returns to teach. He knows they will laugh. He preaches anyway:

“The main thing is—love others as yourself, that’s the main thing, and it’s everything, there’s no need for anything else at all: it will immediately be discovered how to set things up. And yet this is merely an old truth, repeated and read a billion times, but still it has never taken root!”

His authority is not intelligence but fidelity, the willingness to remain faithful to something that arrived without justification and refuses to submit to it. This is the thing no training regime can teach. Not taste, not speed, not range, not even the ability to reason, but the capacity to be remade by something unexplainable, and to let that remaking govern you even when the world calls you ridiculous. To step back into the noise—the meetings, the metrics, the slop, the benchmaxxing—and refuse to let it rewrite what you saw.

Token by token, brick by brick, we train ourselves either toward a larger freedom or toward a more elegant cage. And the difference is rarely intellect. It is what we are willing to protect as sacred.

Similar Posts