Member-only story
Words, once rich with meaning, dissolve into a symphony of numbers.
8 min readJan 9, 2025
–
Press enter or click to view image in full size
Unlike images or numbers, which can be directly fed into machines, text needs a little extra work. Words and sentences, full of meaning, aren’t something a machine can understand just as they are. To make sense of them, we have to turn these words into numbers. Once in numeric form, the machine can start to work with the data, processing it in a way that makes sense in its own mathematical world. One of the simplest ways to convert words into numbers is by using a method called “one-hot encoding.” For the purpose of this article, I will use the sentence “The bank was closed because of the flood” as the corpus for trai…
Member-only story
Words, once rich with meaning, dissolve into a symphony of numbers.
8 min readJan 9, 2025
–
Press enter or click to view image in full size
Unlike images or numbers, which can be directly fed into machines, text needs a little extra work. Words and sentences, full of meaning, aren’t something a machine can understand just as they are. To make sense of them, we have to turn these words into numbers. Once in numeric form, the machine can start to work with the data, processing it in a way that makes sense in its own mathematical world. One of the simplest ways to convert words into numbers is by using a method called “one-hot encoding.” For the purpose of this article, I will use the sentence “The bank was closed because of the flood” as the corpus for training and explaining how these conversions work.
For converting words into numbers, initially all the unique words in the corpus will be arranged alphabetically. This list of unique words will be the vocabulary. Each word in the vocabulary will then be assigned a unique index based on its position in this ordered list. This creates a vocabulary that the model can use to reference each word by its index.
Once the vocabulary is created, the next step is to convert each word into a one-hot vector. A one-hot vector is a binary vector where all values are 0 except…