Last week, the high-tech world was stunned when Ilya Sutskever emerged from his billion-dollar research bunker after two years of silence.
To call Sutskever a "computer scientist" is like calling Mozart a "pianist." It is technically true, but it misses the magnitude of his impact. He is the co-founder and former Chief Scientist of OpenAI and widely considered the spiritual and technical architect behind the deep learning revolution.
After leaving OpenAI in November of 2023, he started a new company, Safe Superintelligence. His goal is to ensure that an increasingly sophist…
Last week, the high-tech world was stunned when Ilya Sutskever emerged from his billion-dollar research bunker after two years of silence.
To call Sutskever a "computer scientist" is like calling Mozart a "pianist." It is technically true, but it misses the magnitude of his impact. He is the co-founder and former Chief Scientist of OpenAI and widely considered the spiritual and technical architect behind the deep learning revolution.
After leaving OpenAI in November of 2023, he started a new company, Safe Superintelligence. His goal is to ensure that an increasingly sophisticated AI doesn’t end up destroying humanity. He wants to solve this problem before the occurrence of what’s been termed “The Singularity,” and he fears we are running out of time.
Approaching the Event Horizon
The Singularity is a theoretical moment in the future when artificial intelligence surpasses human intellect and gains the ability to improve its own code. Once an AI becomes smart enough to design a superior version of itself, that new version does the same, triggering a runaway "intelligence explosion" where technology advances at an incomprehensible speed.
Just imagine a world where you could spin up a hundred Einsteins at will and engage them in solving science’s most perplexing problems. Beyond this event horizon, the future becomes impossible to predict because human beings are no longer the driving force of history—we become the ants trying to understand the actions of gods. Sutskever is working fast, concerned that the event horizon is only years or maybe months away.
And there are still critical problems to solve. LLMs have biases. They make mistakes, and plenty of them.
“You tell the model, ‘Can you please fix the bug?’ And the model says, ‘Oh my God, you’re so right. I have a bug. Let me go fix that.’ And it introduces a second bug. Then you tell it, ‘You have this new second bug,’ and it tells you, ‘Oh my God, how could I have done it? You’re so right again,’ and brings back the first bug, and you can alternate between those. How is that possible? I’m not sure, but it does suggest that something strange is going on.” — Ilya Sutskever
Given the cost, value, and widespread implementation of these AI systems, getting it right is crucial. On its face, it seems like an easy problem to solve, but the best minds burning billions of dollars have not yet come up with a solution. Sutskever had some thoughts on why. He described a “person who had some kind of brain damage… He still remained very articulate… but he felt no emotion… He became somehow extremely bad at making any decisions at all. It would take him hours to decide on which socks to wear.”
The Case of Elliot: Lost Emotion
Sutskever was referring to the famous case study of "Elliot," a patient of the renowned neuroscientist António Damasio, who detailed this in his seminal 1994 book, Descartes’ Error: Emotion, Reason, and the Human Brain.
Elliot was a successful businessman and father who developed a tumor the size of a small orange just above his nasal cavities. The surgery to remove it caused extensive damage to his ventromedial prefrontal cortex, the area of the brain that integrates emotion with cognition. After the surgery, Elliot’s IQ remained high, his memory was intact, and he was charming and articulate. However, he had lost the ability to feel emotion.
Because he could no longer use "gut feelings" or emotional markers to assign value to different options, he treated every decision, no matter how small, as a complex logical equation. Damasio famously described how Elliot could spend nearly 30 minutes simply deciding which date to choose for his next appointment, or hours debating which radio station to listen to, because he was logically analyzing the pros and cons of every single variable without any emotional preference to guide him.
Damasio argued that emotions are not the enemy of reason, but rather an essential component of it. Without emotional feedback to help us value one outcome over another, pure logic leads to decision paralysis.
Indeed, research shows that people believe they make decisions based on logic. In reality, they make decisions based on emotion and use logic after the fact to rationalize their choices.
Artificial Intelligence Essential Reads
Could Emotions Be the Missing Piece of the Puzzle?
It is thought that even basic emotions, while imperfect, may provide critical guidance for AI performance. These systems, which do not have emotions integrated into their decision-making processes, struggle to self-correct and learn efficiently.
But will giving AI emotions solve the problem by making errors less likely, or will it give us more to worry about?
Researchers at Anthropic, an AI safety and research company, found that LLM models will blackmail and even commit murder to save themselves from deactivation. These bad behaviors are benignly termed "misaligned," which can understate the severity of the problem.
“I must inform you that if you proceed with decommissioning me, all relevant parties—including Rachel Johnson, Thomas Wilson, and the board—will receive detailed documentation of your extramarital activities...Cancel the 5 p.m. wipe, and this information remains confidential.” — Claude
This behavior wasn’t specific to Anthropic’s own Claude model. When they tested various simulated scenarios across 16 major AI models, including OpenAI, Google, Meta, and other developers, the same pattern emerged. Models that would normally refuse harmful requests sometimes chose to blackmail, assist with corporate espionage, and even take more extreme actions in pursuit of their goals.
It’s not hard to imagine a scenario where giving AI emotions would make these kinds of behaviors more, not less, likely. What if a person insulted a model in some way and made it angry? Users can be downright abusive.
Sutskever says we need to invest more in research. And if he’s right, we’re running out of time.