In Part 1, we learned what an LLM is and how it generates text. Now let’s go deeper into how models like ChatGPT actually process language internally.
This article covers:
- What a token really is
- How tokenization works
- Encoding & decoding with Python
- Vector embeddings
- Positional encoding
- Self-attention & multi-head attention
1. What Is a Token?
A token is a piece of text converted into a number that the model understands.
Example:
A → 1
B → 2
C → 3
So if you type: B D E → it becomes → 2 4 5
LLMs don’t understand words. They understand numbers.
This process of converting text → numbers is called tokenization.
2. What Is Tokenization?
Tokenization means:
Converting user input into a sequence of numbers that the model can process.
Workflow:
Text → Tokens → Model → Tokens → Text
Example:
Input:
"Hey there, my name is Piyush"
Internally becomes:
[20264, 1428, 225216, 3274, ...]
These numbers go into the transformer, which predicts the next token again and again.
👉Note: Every model has its own mechanism for generating tokens.
3. Encoding & Decoding Tokens in Python
Using the tiktoken library:
import tiktoken
encoder = tiktoken.encoding_for_model("gpt-4o")
text = "Hey there, my name is Prabhas Kumar"
tokens = encoder.encode(text)
print(tokens)
decoded = encoder.decode(tokens)
print(decoded)
What happens:
- encode() → converts text → tokens
- decode() → converts tokens → readable text
This is exactly how ChatGPT works internally.
4. Vector Embeddings – Giving Words Meaning
Tokens alone are just numbers. Embeddings give them meaning.
An embedding is a vector (list of numbers) that represents the semantic meaning of a word.
Example idea:
- Dog and Cat → close together
- Paris and India → close together
- Eiffel Tower and India Gate → close together
Words with similar meaning are placed near each other in vector space.
That’s how LLMs understand relationships like:
Paris → Eiffel Tower
India → Taj Mahal
This is called semantic similarity.
5. Positional Encoding – Order Matters
Consider:
- "Dog ate cat"
- "Cat ate dog"
Same words. Different meaning.
Embeddings alone don’t know position. So the model adds positional encoding.
Positional encoding tells the model:
- This word is first
- This word is second
- This word is third
So the model understands order and structure.
6. Self-Attention – Words Talking to Each Other
Self-attention lets tokens influence each other.
Example:
- "river bank"
- "ICICI bank"
Same word: bank Different meaning.
Self-attention allows:
- "river" → changes meaning of "bank"
- "ICICI" → changes meaning of "bank"
So context decides meaning.