eli5: Why does ChatpGPT give responses word-by-word, instead of the whole answer straight away?

1.46K viewsOtherTechnology

This goes for almost all AI language models that I’ve used.

I ask it a question, and instead of giving me a paragraph instantly, it generates a response word by word, sometimes sticking on a word for a second or two. Why can’t it just paste the entire answer straight away?

In: Technology

28 Answers

Anonymous 0 Comments

Because the specific task that these Large Language Models are trained to perform is not to “answer the question”, but to “guess the next word”. It just so happens to be that solving the problem of “guessing the next word” is apparently a decent enough method for solving problems and generating answers to questions.

Anonymous 0 Comments

It can’t give you a paragraph instantly, because the paragraph is not instantly available.

It is not a rendering gimmick. It is not generating the block of text in one go, and then dripping it out to the recipient purely for the aesthetics. The stream is fundamentally how it is working. It’s a iterative process, and you’re seeing each iteration in real time as each word is being predicted. The models work by taking a body of text as a prompt and then predicting what word should come next*. Each time a new word is generated that new word is added to the prompt, and then that whole new prompt is used in the next iteration. This is what allows successive iterations to remain “aware” of what has been generated thus far.

The UI could have been created so that this whole cycle is allowed to complete before printing the final result, but this would mean waiting for the last word not getting the paragraph instantly. It may as well print each new word as and when it is available. When it gets stuck for a few seconds, it genuinely is waiting for that word to be generated.

*with some randomness to produce variety. It picks from the top candidates within an assigned threshold called the temperature.

Anonymous 0 Comments

It is actually doing it letter by letter, because it’s not thinking, just analyzing a big book of text and running the statics on what letter comes next based on the prompt and general chances of ANY letter coming next. There is no intelligence in these algorithms, just messy statistics that aren’t actually the correct answer because we aren’t looking for correctness, just plausibility.

Anonymous 0 Comments

Because that’s exactly how it works. LLMs are, at an extremely basic level, playing “guess the next word” based on a giant archive of text data.

Anonymous 0 Comments

Because they are pretending to act like you are chatting with a real person, the thing already made the text, the user interface presents it to you as if someone is typing.

We did the same thing back in college with chat bots, complete with random typos that you’d see it backspace.

Anonymous 0 Comments

These top responses are not quite correct. Language models do not just generate word by word. They would show obvious signs of semantic error if they did. Models are very much able to take in different layers of context to decide how to generate text.

The reason you see Chat GPT generate responses word by word is because the designers built it that way. My guess is they wanted you to “see” the text generation. It’s an interface decision, not a consequence of how models generate text.

Anonymous 0 Comments

The way the software is written, it comes up with a response one “word” at a time. I put word in quotes because sometimes the next word is not really a word that you see on the screen. For example, the next “word” could just be “This is the end of the message”.

Each word takes a lot of computation. That requires time, energy, computing resources such as CPUs and GPUs running on a server somewhere, and cooling. Compared to other things that computers do, computing the next word in ChatGPT4 takes a large amount of computation. Multiplied by how many people are using the service at the same time.

If it were to send the entire message at once, the reader would just be waiting there. So they send it one word at a time so you can start reading it even while it’s still writing. Another benefit is that you can see it is successfully writing and not just stuck.

Anonymous 0 Comments

Think of it as a progress bar. It’s being streamed to your ui more or less as it’s being generated

Anonymous 0 Comments

Perhaps a better answer, why does it delete the post after it makes it? I get why copilot deletes my requests to talk about a free tibet, claude refuses to run a ttrpg in the world of darkness, but why the fuck does Llama3 delete every question I have about diets after answering the whole question?

Anonymous 0 Comments

It’s streaming. You can make an LLM that is non-streaming and it will show the response as a block.