Transformers Must Hallucinate

4 min read1 hour ago

–

This is not a critique of engineering choices.

It is a structural argument: **given how transformers are built, hallucination is not a bug — it is inevitable.**

By “hallucination,” I mean confident, specific assertions that are **not supported by the available evidence**, rather than obvious nonsense.

The claim here is simple:

> Any system that always produces an answer, while collapsing evidence into a single point before checking consistency, must hallucinate on some inputs.

Transformers do exactly this.

— -

## 1. What a Transformer Actually Does

At the heart of every transformer is attention. Stripped to its essentials, an attention head:

1. Takes a set of signals (tokens, memories, documents, tools) 2. Scores them against a query…

4 min read1 hour ago

–

This is not a critique of engineering choices.

It is a structural argument: **given how transformers are built, hallucination is not a bug — it is inevitable.**

By “hallucination,” I mean confident, specific assertions that are **not supported by the available evidence**, rather than obvious nonsense.

The claim here is simple:

> Any system that always produces an answer, while collapsing evidence into a single point before checking consistency, must hallucinate on some inputs.

Transformers do exactly this.

— -

## 1. What a Transformer Actually Does

At the heart of every transformer is attention. Stripped to its essentials, an attention head:

1. Takes a set of signals (tokens, memories, documents, tools) 2. Scores them against a query 3. Produces a weighted average 4. Treats that average as the meaning of the situation

No matter how deep the network is, how large the model is, or how clever the training regime is, **this core move does not change**.

Evidence goes in. A single vector comes out.

There is no explicit notion of:

contradiction
inconsistency
impossibility
or refusal

The architecture assumes that a point-valued answer always exists.

— -

## 2. Why Averaging Is the Critical Mistake

Imagine you are given two perfectly reliable statements:

“The bike is red.”
“The bike is not red.”

A human immediately recognizes that something is wrong. The correct response is not a guess but a pause: *these statements cannot both be true at once*.

A transformer does something else.

Internally, these statements are represented as vectors. Attention computes a weighted combination of those vectors and normalizes the result. The output is a new direction — *somewhere between* the two.

But there is no such thing as “half red and half not red.”

The model has fabricated a meaning that **no witness supports**.

This is hallucination.

Crucially: the model did not make a mistake. It did exactly what it was designed to do.

— -

## 3. The Missing Step: Checking Whether an Answer Exists

Before choosing an answer, a system should ask a simpler question:

> Is there any answer that satisfies all the evidence at once?

Humans do this naturally. Most algorithms do not.

Transformers never ask this question.

They merge information first and only reason afterward. Once evidence has been collapsed into a single point, **it is already too late** to detect inconsistency. Contradiction has been smoothed away.

This is why post‑hoc “fact checking,” “self‑reflection,” or safety prompts do not work reliably. They operate on an already hallucinated internal state.

You cannot detect contradiction after you’ve averaged it out.

— -

## 4. Ambiguity Is Not an Error — It’s a Region

Not all uncertainty is contradiction. Often, the evidence is simply *underspecified*.

For example:

“The meeting is sometime next week.”

There are many valid interpretations. A correct system should acknowledge that multiplicity.

Transformers do not represent sets of possible meanings. They represent **one**.

To choose a single output from an ambiguous situation, the system must introduce bias:

from training data
from prompt wording
from safety heuristics
from reinforcement learning

This bias is usually hidden and unacknowledged.

When the evidence is weak but the system still answers, the result feels confident — but that confidence is artificial.

Again: hallucination is not accidental. It is the result of forcing ambiguity into a single point.

— -

## 5. Long Context Makes This Worse, Not Better

It is tempting to think that adding more context solves hallucination.

In fact, it often does the opposite.

As more information flows into attention, two things happen: 1. Genuine evidence accumulates 2. Contradictions and weak signals accumulate faster

Attention cannot represent conflicting structure. As context grows, attention weights flatten, distinctions collapse, and the model begins to respond with generic or fabricated assertions.

This is not a scaling failure.

It is a geometric one.

— -

## 6. Why “Always Answer” Guarantees Hallucination

Here is the unavoidable conclusion:

Some questions are contradictory
Some questions are underspecified
Some questions admit no valid answer

Any system that is required to always produce an output must, on those inputs, invent one.

There is no clever training trick or larger dataset that avoids this.

If refusal is not an allowed output, hallucination is.

— -

## 7. Refusal Is Not a Safety Feature

Most systems treat refusal as an external patch:

a policy rule
a moderation layer
a last‑ditch defense

This is backwards.

Refusal is the **mathematically correct response** when:

no interpretation satisfies the evidence
or the space of interpretations is too broad to choose from honestly

In those cases, answering is the unsafe action.

— -

## 8. What Would a Non‑Hallucinating System Do?

A system that does not hallucinate must:

1. Preserve evidence as a **set**, not a point 2. Check whether the set is internally consistent 3. Represent ambiguity explicitly 4. Allow “no answer” as a stable outcome 5. Introduce bias only when explicitly instructed

This requires a different execution order:

> **admissibility before generation**

Transformers do this in reverse.

— -

## 9. The Core Insight

Transformers hallucinate not because they are poorly trained, but because they are *too decisive*.

They commit to an answer before asking whether an answer exists.

Until architectures change to allow:

inconsistency to block generation
ambiguity to remain unresolved
refusal to be first‑class

hallucination will remain a fundamental property of the system.

Not a flaw.

A consequence.

— -

## 10. The Point of This Essay

This is not an attack on transformers.

They are extraordinary tools.

But we should stop pretending that hallucination is an engineering defect we can “mostly fix.” It is a direct result of collapsing meaning too early.

If we want systems that reason safely in high‑stakes settings, we need architectures that respect when the right answer is:

> **I don’t know.**

Until then, hallucination is not optional.

It is guaranteed.

Similar Posts