Take a piece of paper and on it, at the three points of an imaginary equilateral triangle, draw three dots. Looking at that piece of paper, which pair of dots is closest to each other? There is no answer.
Now, take that same piece of paper and fold it in half along the midpoint between two dots so that the two dots are nearly touching each other. Ask yourself, again, which pair of dots is closest? The answer is obvious.
What this exercise shows is the power of adding dimensions. When you are looking at the flat piece of paper with the three dots on it, you are seeing the situation in two dimensions. As soon as you fold the paper, however, you’ve introduced a new, third dimension. This is the power of adding dimensions: distances and relationships that may not have been apparent at lower dimensionality become clear when you increase the number of dimensions.
Why is this relevant to AI? To understand this, it is necessary to understand how transformers (a key element of large language models, or LLMs) work. The easiest way to understand this, is to look at a bit of history. (I’ll summarize a bit, and leave out some details, but I think that even if this story is not exactly an accurate historical record, it has proved invaluable to me in understanding how we got here.)
Translating text between two languages has been a challenge that computer scientists struggled with for a long time. Initial efforts focused primarily on essentially providing "dictionaries" that a program could use to look up what a word in one language would translate to in another. There are two major problems with this approach. The first is that not every pair of languages has a one-to-one correspondence of words, terms, or concepts. The second, and more concerning problem if you want to build a system that can translate any arbitrary text in one language into any arbitrary second language, is that the number of language pairs you’d need to have dictionaries for explodes combinatorically as you add more languages, and the number of examples in both languages that you need to build these dictionaries does not keep up.
It was at this point that the computer scientists hit on an idea: what if there existed some magical "universal" language. Then, you wouldn’t need to be concerned with every possible pair of languages. Instead, you could simply write a program that converted between every real language and this "universal" language, thus enabling translation between any arbitrary pair of languages via this intermediate. But how does one discover a "universal" language?