Today, Google announced the release of Gemini 1.5 Pro, its next gen LLM.
Sundar Pichai posted: “Gemini 1.5 Pro, our mid-sized model, will soon come standard with a 128K-token context window, but starting today, developers + customers can sign up for the limited Private Preview to try out 1.5 Pro with a groundbreaking and experimental 1 million token context window!”
What does it mean to have 1 million token context window and how does it compare with the previous Gemini 1.0 Pro and OpenAI’s GPT 4.0?
In: Technology
“Context window” is basically the model’s memory–it’s how much of what you’ve previously input into the model that will affect its results for your next input.
“Tokens” get a little weirder. They’re essentially pieces of meaning. That can be sentences, words, individual letters–it really depends on the model and on how it processes input. When you input something into the model, it breaks it down into appropriately sized chunks of meaning (as another commenter mentioned, “red” being one token, vs “r” “e” and “d” each being a token).
This particular model has a context window a million tokens wide, which means it can remember what you’ve said to it up to 1 million tokens.
As a quick example, pretend the model has no context window at all (or one only as big as the size of the input you’ve just made to it). You say “this is Bob, and he’s a vampire.”
Next, you ask it “what is Bob?” And it’ll spit out something about Bob being a typical human name, because it has no memory of you telling it Bob is a vampire.
Now, the same model but with a context window: you tell it Bob is a vampire, and then when you go ask it what Bob is, it will remember that Bob is a vampire (as long as you didn’t tell it that too long ago). The width of the context window determines what “too long” is.
ChatGPT has an 8,000 token context window (and I think an upgraded version with 32,000?). This means that Gemini’s able to ‘remember’ things for significantly longer than ChatGPT.
Large language models convert your input into a string of tokens which is what it actually processes. Each token can represent a combination of entire words, word fragments, or single letters depending on the implementation and context. Think of the tokens as the vocabulary of the LLM, if you type the word ‘red’ then it might convert it into a token that represents the word red, rather than ingesting the letters r-e-d individually.
The way LLM models work is that you give them a piece of text and tell them to predict the next word and do it over and over so that it forms sentences and answers you. But, a AI model doesn’t really understand text so instead its encoded as tokens, for example word “tokens” translates to some number, lets say 0.2495292603 The entire text you give it is encoded as series of such numbers and the next word it generates is also a number. There is a dictionary to translate words to numbers and vice versa.
Now, the length of that array, how many numbers you can and must give it, is fixed by model design. If your model can work with 10 word input array and you give it 20, you might as well not give it the first 10 because it can’t use them. In a text generation you see it as the model “forgetting” start of the conversation if it gets too long.
Do you need million word long text generations? Probably not too often. But this context window is useful in other ways too, you can add all sorts of information there. If you want to ask a question about laws for example, you put the relevant law text first and then ask your question about it. That way the AI can reference is same as it can reference any past conversation you have had with it.
So the size of the context window is pretty important quality of a LLM model, but it also costs, bigger context window means bigger model and more expense in computing it.
Latest Answers