How does GPT-2 Tokenize Text? (opens in new tab)
Let’s explore how GPT-2 tokenizes text. What is tokenization? It’s important to understand that GPT-2 doesn’t work with strings directly. Instead, it needs to tokenize the input string, which is essentially a process for converting the string into a list of numbers, or “tokens”. It is these tokens which are passed into the model during training or for inference. As a concrete example, let’s look at a few sample sentences:
Read the original article