basically when you train a large language model you have to decide what kind of data to feed it, and while individual letters might seem intuitive it turns out for various reasons it’s better to group common letter combinations together, these can range from single letters, to pairs of letters to entire words or maybe spamming between two words
that’s what a token is, a bit of text grouped together in one single object that chatgpt can understand
usually a separate system is used to define the tokens, break words into tokens and viceversa, and the most common combinations usually get assigned large tokens while less frequent ones are split up into multiple tokens
Latest Answers