Writing Your First Compiler - Part 3: Lexical Analysis
dev.to·1d·
Discuss: DEV
Flag this post

The first step in any compiler is lexical analysis - breaking source code into tokens. When you see 2 + 3, you immediately recognize three separate things: a number, a plus sign, and another number. Your brain does this automatically. But a compiler has to explicitly walk through each character and figure out what’s meaningful.

Let’s say we want to tokenize this expression:

2 + 3

We need to walk through the string character by character. When we see 2, that’s a number. The space doesn’t mean anything. The + is an operator. Another space. Then 3 is another number.

So we end up with three tokens: NUMBER(2), PLUS(+), NUMBER(3).

That’s it. That’s what a lexer does - it turns a string of characters into a list of meaningful tokens.

Tokens and Lexeme…

Similar Posts

Loading similar posts...