How Tokenization, Embeddings & Attention Work in LLMs (Part 2) (opens in new tab)

In Part 1, we learned what an LLM is and how it generates text. Now let’s go deeper into how models like ChatGPT actually process language internally.

This article covers:

  • What a token really is
  • How tokenization works
  • Encoding & decoding with Python
  • Vector embeddings
  • Positional encoding
  • Self-attention & multi-head attention

1. What Is a Token?

A token is a piece of text converted into a number that the model understands.

Example:

A → 1
B → 2
C → 3

So if you type: B D E → it becomes → 2 4 5

LLMs don’t understand words. They understand numbers.

This process of converting text → numbers is called tokenization.

2. What Is Tokenization?

Tokenization means:

Converting user input into a sequence of numbers that the model can process.

Workflow:

Text → Tokens → Model → Tokens → Text

Example:

Input:

"Hey there, my name is Piyush"

Internally becomes:

[20264, 1428, 225216, 3274, ...]

These numbers go into the transformer, which predicts the next token again and again.

👉Note: Every model has its own mechanism for generating tokens.

3. Encoding & Decoding Tokens in Python

Using the tiktoken library:

import tiktoken

encoder = tiktoken.encoding_for_model("gpt-4o")

text = "Hey there, my name is Prabhas Kumar"
tokens = encoder.encode(text)

print(tokens)

decoded = encoder.decode(tokens)
print(decoded)

What happens:

  • encode() → converts text → tokens
  • decode() → converts tokens → readable text

This is exactly how ChatGPT works internally.

4. Vector Embeddings – Giving Words Meaning

Tokens alone are just numbers. Embeddings give them meaning.

An embedding is a vector (list of numbers) that represents the semantic meaning of a word.

Example idea:

  • Dog and Cat → close together
  • Paris and India → close together
  • Eiffel Tower and India Gate → close together

Words with similar meaning are placed near each other in vector space.

That’s how LLMs understand relationships like:

Paris → Eiffel Tower
India → Taj Mahal

This is called semantic similarity.

5. Positional Encoding – Order Matters

Consider:

  • "Dog ate cat"
  • "Cat ate dog"

Same words. Different meaning.

Embeddings alone don’t know position. So the model adds positional encoding.

Positional encoding tells the model:

  • This word is first
  • This word is second
  • This word is third

So the model understands order and structure.

6. Self-Attention – Words Talking to Each Other

Self-attention lets tokens influence each other.

Example:

  • "river bank"
  • "ICICI bank"

Same word: bank Different meaning.

Self-attention allows:

  • "river" → changes meaning of "bank"
  • "ICICI" → changes meaning of "bank"

So context decides meaning.

Loading more...

Keyboard Shortcuts

Navigation
Next / previous item
j/k
Open post
oorEnter
Preview post
v
Post Actions
Love post
a
Like post
l
Dislike post
d
Undo reaction
u
Save / unsave
s
Recommendations
Add interest / feed
Enter
Not interested
x
Go to
Home
gh
Interests
gi
Feeds
gf
Likes
gl
History
gy
Changelog
gc
Settings
gs
Browse
gb
Search
/
General
Show this help
?
Submit feedback
!
Close modal / unfocus
Esc

Press ? anytime to show this help