4 min read2 days ago
–
Hype is blurring the line between fluent language and actual intelligence.
Recent work has explored how close today’s large language models, such as GPT-5, are to reaching artificial general intelligence (AGI).
A Coherence-Based Measure of AGI, estimates that GPT-5 is at most 24% of the way toward AGI, while GPT-4 (2023) was only around 7%. The progress is significant, but we’re still far from true general intelligence.
Press enter or click to view image in full size
To measure it, we must first define it. Numerous definitions have been proposed, but a recent one by Hendrycks et al. (2025) states:
***“AGI is an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult.…
4 min read2 days ago
–
Hype is blurring the line between fluent language and actual intelligence.
Recent work has explored how close today’s large language models, such as GPT-5, are to reaching artificial general intelligence (AGI).
A Coherence-Based Measure of AGI, estimates that GPT-5 is at most 24% of the way toward AGI, while GPT-4 (2023) was only around 7%. The progress is significant, but we’re still far from true general intelligence.
Press enter or click to view image in full size
To measure it, we must first define it. Numerous definitions have been proposed, but a recent one by Hendrycks et al. (2025) states:
“AGI is an AI that can match or exceed the cognitive versatility and proficiency of a well-educated adult.”
Drawing from the Cattell–Horn–Carroll (CHC) cognitive theory, ten human-aligned cognitive domains can be evaluated, including reasoning, memory, perception, processing speed, mathematics, knowledge, and others. The domains assessed are:
- On-the-spot reasoning
- Working memory
- Long-term memory storage
- Long-term memory retrieval
- Visual processing
- Auditory processing
- Processing speed
- Mathematics
- Reading & writing
- General knowledge
Each domain is evaluated against human-level proficiency.
Below are the capabilities of GPT-4 (2023) and GPT-5 (2025) across these cognitive domains.
Press enter or click to view image in full size
The capabilities of GPT-4 and GPT-5. Here GPT-5 answers questions in ‘Auto’ mode. Image from Hendryks et al. 2025.
But how should we interpret these ten scores?
If we simply take the arithmetic mean of the ten domains, we can produce a single “% toward AGI” score. That arithmetic approach suggests GPT-5 is about 58% of the way to AGI.
While intuitive, the arithmetic mean makes a strong implicit assumption:
Strength in some faculties can compensate for catastrophic weakness in others.
In human cognition, this is not true. A person with near-perfect knowledge but essentially no long-term memory, perceptual grounding, or real-time responsiveness would not be considered “halfway to human intelligence.”
Intelligence is not the sum of isolated skills; it is a system of interdependent faculties.
The Hidden Bottlenecks
On the surface, GPT-5 appears impressive: perfect or near-perfect scores in mathematics, reading/writing, and general knowledge. But look deeper, and you’ll spot the crucial deficits:
- 0% long-term memory storage
- fragile retrieval accuracy
- weak visual grounding
- low processing speed
Those are system-limiting bottlenecks. This leads to the central issue: the arithmetic mean rewards peaks and hides cliffs. Under that permissive definition, GPT-5 looks more than halfway to AGI.
But if we instead ask for high and stable performance across all cognitive domains…
…the picture changes dramatically.
A Coherence-Aware Metric
Press enter or click to view image in full size
Comparison of model performance across aggregation exponents p. Curves show AGI_p; shaded regions indicate the AUC. Image from Fourati 2025.
In my recent paper, A Coherence-Based Measure of AGI, I propose a different approach: using the generalized mean with a variable compensability exponent (“p”) to aggregate domain-scores.
Press enter or click to view image in full size
- When p = 1 (arithmetic), strengths can hide weaknesses.
- When p = 0 (geometric), imbalance is penalized.
- When p = −1 (harmonic), the weakest domain dominates.
To summarise robustness across the full continuum of “compensability,” the area under the curve (AUC) across the domain of compensability can be computed, which integrates performance into a single value.
Under this coherence-aware AGI metric:
- GPT-4 scores 7%
- GPT-5 scores 24%
The implication is stark:
We are no more than ~24% of the way to coherent general intelligence.
Not because progress is illusionary, but because imbalances constrain functionality.
Long-term memory, adaptive reasoning, multimodal grounding, and real-time responsiveness remain structurally immature.
Why This Matters
A model with:
- perfect knowledge,
- perfect short-term memory
…but 0% long-term memory storage is not close to general intelligence. It cannot retain experience, form plans over time, or develop durable representations of the world.
Coherence reveals that what appears to be general intelligence is actually stacked specialization.
And until catastrophic weaknesses are eliminated , not merely averaged away , the “general” part of AGI remains unrealized.
Looking Ahead
True progress toward AGI will require closing bottlenecks in:
- persistent memory,
- real-time responsiveness,
- multimodal reasoning,
- grounded abstraction,
- continuous learning.
Thus a coherence-aware measure forces us to confront valleys.
And that shift in perspective may be what keeps our measurements aligned with what general intelligence actually requires.
If you enjoyed this, follow me for more insights on machine learning, AI, and AGI.
References
- Fares Fourati (2025) “A Coherence-Based Measure of AGI” https://arxiv.org/pdf/2510.20784v1
- Dan Hendrycks et al. (2025) “A Definition of AGI” https://arxiv.org/pdf/2510.18212