How Tokenization, Embeddings & Attention Work in LLMs (Part 2)

In Part 1, we learned what an LLM is and how it generates text. Now let’s go deeper into how models like ChatGPT actually process language internally.

This article covers:

What a token really is
How tokenization works
Encoding & decoding with Python
Vector embeddings
Positional encoding
Self-attention & multi-head attention

1. What Is a Token?

A token is a piece of text converted into a number that the model understands.

Example:

A → 1
B → 2
C → 3

So if you type: B D E → it becomes → 2 4 5

LLMs don’t understand words. They understand numbers.

This process of converting text → numbers is called tokenization.

2. What Is Tokenization?

Tokenization means:

Converting user input into a sequence of numbers that the model can process.

Workflow:

Text → Tokens → Model → Tokens → Text

Example:

Input:

"Hey there, my name is Piyush"

Internally becomes:

[20264, 1428, 225216, 3274, ...]

These numbers go into the transformer, which predicts the next token again and again.

👉Note: Every model has its own mechanism for generating tokens.

3. Encoding & Decoding Tokens in Python

Using the tiktoken library:

import tiktoken

encoder = tiktoken.encoding_for_model("gpt-4o")

text = "Hey there, my name is Prabhas Kumar"
tokens = encoder.encode(text)

print(tokens)

decoded = encoder.decode(tokens)
print(decoded)

What happens:

encode() → converts text → tokens
decode() → converts tokens → readable text

This is exactly how ChatGPT works internally.

4. Vector Embeddings – Giving Words Meaning

Tokens alone are just numbers. Embeddings give them meaning.

An embedding is a vector (list of numbers) that represents the semantic meaning of a word.

Example idea:

Dog and Cat → close together
Paris and India → close together
Eiffel Tower and India Gate → close together

Words with similar meaning are placed near each other in vector space.

That’s how LLMs understand relationships like:

Paris → Eiffel Tower
India → Taj Mahal

This is called semantic similarity.

5. Positional Encoding – Order Matters

Consider:

"Dog ate cat"
"Cat ate dog"

Same words. Different meaning.

Embeddings alone don’t know position. So the model adds positional encoding.

Positional encoding tells the model:

This word is first
This word is second
This word is third

So the model understands order and structure.

6. Self-Attention – Words Talking to Each Other

Self-attention lets tokens influence each other.

Example:

"river bank"
"ICICI bank"

Same word: bank Different meaning.

Self-attention allows:

"river" → changes meaning of "bank"
"ICICI" → changes meaning of "bank"

So context decides meaning.

Loading more...