eli5: Why does ChatpGPT give responses word-by-word, instead of the whole answer straight away?

1.45K viewsOtherTechnology

This goes for almost all AI language models that I’ve used.

I ask it a question, and instead of giving me a paragraph instantly, it generates a response word by word, sometimes sticking on a word for a second or two. Why can’t it just paste the entire answer straight away?

In: Technology

28 Answers

Anonymous 0 Comments

Modern ai is really truely just an advanced version of that thing where you hit the middle word in autocomplete. It doesn’t know what word it will use next until it sees what word comes up last. It’s generating as its showing.

Anonymous 0 Comments

because thats how these answers are generated, such a language model does not generate an entire paragraph of text but instead generates one word and then generates the next word that fits in with the first word it has previously generated while also trying to stay within the context of your prompt.

It helps to stop thinking about these language model AI´s as some kind of program acting like a person who writes you a response and think of it more like as a program design to make a text that feels natural to read.

Like if you were just learning a new language and trying to form a sentence, you would most likely also go word by word trying to make sure the next word fits into the sentence.

Thats also why these language models can make totally wrong answers seem like they are correct, everything is nicely put together and fits into the sentences and paragraphs but the underlying information used to generate that text can be entirely made up.

edit:

just wanna take a moment here to say these are really great discussions down here, even if we are not all in agreement theres a ton of perspective to be gained.

Anonymous 0 Comments

It could just give you the whole thing after it is done, but then you would be waiting for a while.

It is generated word by word and seeing progress keeps you waiting. So there is no reason for them to delay giving you the response.

Anonymous 0 Comments

Mostly, it’s a design choice for user comfort. Computers generate most things in sequence, but AI is quite slow compared to other software, so they decided to give every bit right away as it’s generated, to not make users impatient. Especially since LLM generates quite long texts to begin with. It’s also more impressive. If LLM would be “thinking…” between each answer for 10+ seconds, you wouldn’t find it as cool.

Anonymous 0 Comments

It’s just not fast enough to give the whole answer straight away; getting the LLM to give you one ‘word’ at a time is called “streaming”, and in some cases it is something you have to deliberately turn on, otherwise you’d just be sitting there looking at a blank space for a minute before the whole paragraph just pops out.

Anonymous 0 Comments

A lot of these answers that you’re getting are incorrect.

You see responses appear “word by word” so that you can begin reading as quickly as possible. Because most chat wrappers don’t allow the AI to edit previously written words, it doesn’t make sense to force the user to wait until the entire response is written to actually see it.

It takes actual time for the response to be written. When the response slowly trickles in, you’re seeing in real time how long it takes for that response to be generated. Depending on which model you use, responses might appear to form complete paragraphs instantly. This is merely because those models run so quickly that you can’t perceive the amount of time it took to write.

But if you’re using something like GPT4, you see the response slowly trickle in because that’s literally how long it’s taking the AI to write it, and because right now ChatGPT isn’t allowed to edit words it’s already written, there is no point in waiting until it’s “done” before sending it over to you. Keep in mind that its lack of ability to edit words as it goes is an _implementation detail_ that will very likely start changing in future models.

Anonymous 0 Comments

Of all the text that has been written, it preticts the next word.
So when you ask “Who is Michael Jordan?” It will take that sentence and predict what the next word is. So it Predicts “Michael”. Then to predict the next word it takes the text: “Who is Michael Jordan? Michael” and predicts Jordan. Then it starts over and again with the text: “Who is Michael Jordan? Michael Jordan”. In the end it says “Who is Michael Jordan? Michael Jordan is a former basketball player for the Chicago Bulls”. So bascily it takes a text and predicts the next word. That is why you get word by word. Its not really that advance.

Anonymous 0 Comments

I do find that quite interesting. I have some ideas of where I am going as I write. Does the Ai have no idea?

Anonymous 0 Comments

You could get a 30second long loading bar for every reply you give… But most people would drop the tool almost instantly, as our attention span keeps on shrinking at a staggering pace.

As the things stand, it is much more desirable to have *immediate* output than having *complete* output.

Also LLM technology works one word at a time at the moment, thus the visual output reflects the digital output of the algorithm

Anonymous 0 Comments

That’s the whole game… it’s doing massive amounts of math to decide the next word that makes sense