In Part 1, we learned what an LLM is and how it generates text. Now let’s go deeper into how models like ChatGPT actually process language internally.
This article covers:
- What a token really is
- How tokenization works
- Encoding & decoding with Python
- Vector embeddings
- Positional encoding
- Self-attention & multi-head attention
1. What Is a Token?
A token is a piece of text converted into a number that the model understands.
Example:
A → 1
B → 2
C → 3
So if you type: B D E → it becomes → 2 4 5
LLMs don’t understand words. They understand numbers.
This process of converting text → numbers is called tokenization.
2. What Is Tokenization?
To…
In Part 1, we learned what an LLM is and how it generates text. Now let’s go deeper into how models like ChatGPT actually process language internally.
This article covers:
- What a token really is
- How tokenization works
- Encoding & decoding with Python
- Vector embeddings
- Positional encoding
- Self-attention & multi-head attention
1. What Is a Token?
A token is a piece of text converted into a number that the model understands.
Example:
A → 1
B → 2
C → 3
So if you type: B D E → it becomes → 2 4 5
LLMs don’t understand words. They understand numbers.
This process of converting text → numbers is called tokenization.
2. What Is Tokenization?
Tokenization means:
Converting user input into a sequence of numbers that the model can process.
Workflow:
Text → Tokens → Model → Tokens → Text
Example:
Input:
"Hey there, my name is Piyush"
Internally becomes:
[20264, 1428, 225216, 3274, ...]
These numbers go into the transformer, which predicts the next token again and again.
👉Note: Every model has its own mechanism for generating tokens.
3. Encoding & Decoding Tokens in Python
Using the tiktoken library:
import tiktoken
encoder = tiktoken.encoding_for_model("gpt-4o")
text = "Hey there, my name is Prabhas Kumar"
tokens = encoder.encode(text)
print(tokens)
decoded = encoder.decode(tokens)
print(decoded)
What happens:
- encode() → converts text → tokens
- decode() → converts tokens → readable text
This is exactly how ChatGPT works internally.
4. Vector Embeddings – Giving Words Meaning
Tokens alone are just numbers. Embeddings give them meaning.
An embedding is a vector (list of numbers) that represents the semantic meaning of a word.
Example idea:
- Dog and Cat → close together
- Paris and India → close together
- Eiffel Tower and India Gate → close together
Words with similar meaning are placed near each other in vector space.
That’s how LLMs understand relationships like:
Paris → Eiffel Tower
India → Taj Mahal
This is called semantic similarity.
5. Positional Encoding – Order Matters
Consider:
- "Dog ate cat"
- "Cat ate dog"
Same words. Different meaning.
Embeddings alone don’t know position. So the model adds positional encoding.
Positional encoding tells the model:
- This word is first
- This word is second
- This word is third
So the model understands order and structure.
6. Self-Attention – Words Talking to Each Other
Self-attention lets tokens influence each other.
Example:
- "river bank"
- "ICICI bank"
Same word: bank Different meaning.
Self-attention allows:
- "river" → changes meaning of "bank"
- "ICICI" → changes meaning of "bank"
So context decides meaning.
7. Multi-Head Attention – Looking at Many Angles
Multi-head attention means the model looks at:
- Meaning
- Position
- Context
- Relationship
At the same time.
Like a human observing many things at once.
This gives the model a deep understanding of the sentence.
8. Final Flow of an LLM
User enters text
Tokenization → numbers
Embeddings → meaning
Positional encoding → order
Self + Multi-head attention → context
Linear + Softmax → probability of next token
Decode → readable output
Final Thoughts
LLMs don’t know language. They predict tokens based on probability and patterns.
Yet the result feels intelligent because:
- Tokens carry meaning (embeddings)
- Order is preserved (positional encoding)
- Context is understood (attention)
And that’s the magic behind ChatGPT.