why do models like ChatGPT forget things during conversations or make things up that are not true?

569 views

why do models like ChatGPT forget things during conversations or make things up that are not true?

In: 803

23 Answers

Anonymous 0 Comments

It looks like a chat to you, with history, but to GPT, every time you send a message, it’s aa brand new “person” with no memory of you. With every message you send, it receives your message and a bit of context based on keywords in your last message.

It’s like if you were talking to your grandma that has dementia.
Whenever you say something, even in the middle of the conversation, it’s like the first thing you say to her as far as she knows. But then based on the words and concept you used in what you said, her brain is like “hey, that vaguely connect to something” and it brings part of that “something” up.
SO she’s able to answer you semi-coherently, even tho you’re just a stranger and her answer is based on your last message and a few vague unprecise memories of past things you’ve said or she used to know.

Anonymous 0 Comments

You’re getting loads of opinionated answers, and many people claiming what is to “think” or not, which becomes very philosophical and also not suitable for an ELI5 explanation I think.

To answer your question, chatGPT repeats what it learned from reading loads of sources (internet and books, etc), so it’ll repeat what is most likely to appear as the answer to your question. If a wrong answer is repeated many times, chatGPT will consider it as the right answer, so in that case it’d be wrong.

Anonymous 0 Comments

Since most comments here all state that chatGPT is stupid and doesn’t know anything. There is a interested factor in nature that is pretty much how chatGPT works. Swarm intelligence (in chatGPTs case it’s a lot of transformers stuck together). This has been shown time and time again, with ants and many other natural occuring things. Even cells (yes also your cells) basically are really simple and stupid, but through combining many stupid things you get something not so stupid (some would consider smart). Although it is true that chatGPT predicts “only” the next word and it uses numbers to represent said words, I would not call it simple or stupid. Reason being is, to be able to predict the next word, in this case number or token, you will have to “understand” the relationship between those tokens, words, numbers. Even though chatGPT doesn’t have a model of the world inside and so yes it won’t know what the word actually means or what that object is, it still needs to understand that this word has a certain relationship with another word. If it couldn’t do so it wouldn’t be able to create coherent sentences. Now this doesn’t mean it understands said words, however it must at least to a certain degree understand the relationship between words (token). Now here comes the interesting part, there seems to be “emerging abilities” from LLMs, which were not trained to the model at all. (Google paper on Bard learning a language by itself without ever having any reference to this language in it’s training data would be one example). This phenomenon also emerges in swarm intelligence, as a single ant is super stupid, but in combination with a swarm can do amazing things.
So now full circle, yes chatGPT has no concept of our world whatsoever, that being said it has an internal “world view” (I’m calling it world view for simplicity, it’s more an understanding of relationships between tokens). This “world view” gives it the ability to sometimes solve things that are not within it’s training data, but due to the relationship of it’s tokens. Now does this make chatGPT or LLMs smart? I would not say so, but I would also not call them stupid.

(One Article with links to the papers about emerging abilities: https://virtualizationreview.com/articles/2023/04/21/llm-emergence.aspx?m=1)

Anonymous 0 Comments

So, I see a confidently wrong answer here: that it doesn’t “understand”.

It absolutely develops understandings of relationships between words according to their structure and usage.

Rather, AI as it stands today has “limited context”, the same way humans do. If I were to say a bunch of stuff you you that you don’t end up paying attention to well, and then I talked about something else, how much would you really remember of the dialogue?

As it is, as a human, this same event happens to me.

It has nothing to do about what is or is not understood of the contents, but simply an inability to pay attention to too much stuff all at the same time. Eventually new stuff in the buffer pushes out the old stuff.

Sometimes you might write it on a piece of paper to study later (do training on), but the fact is that I don’t remember a single thing about what I did two days ago. A week ago? LOL.

Really it forgets stuff because nothing can remember everything indefinitely forever except very rare people and the people that do actually remember everything would not recommend the activities they are compelled to engage in that allow their recall: it actually damages their ability to look at information contextually, just like you can’t take a “leisurely sip” from a firehose.

As to making things up that aren’t true, we trained it explicitly, tuned it, built it’s very base model, from a dataset in which all presented response to all queries was confidently providing an answer, so the way the LLM understands questions is “something that must be answered as a confident AI assistant who knows the answer would”.

If the requirement was to reflect uncertainty as is warranted, I expect many people would be dissatisfied with the output since AI would render many answers with uncertainty even when humans are confident the answer must be rendered and known by the LLM… Even when the answer may not actually be so accessible or accurate.

The result here is that we trained something that is more ready to lie than to invite the thing that has “always” happened before when the LLM produces bad answers (backpropagation stimulus).

Anonymous 0 Comments

It does not give answers.

Re-frame your thinking this way: It gives you text that is supposed to look like something a human could give you as response to your input (question). It just so happens that the text it finds most related to your input tends to be what you’re looking for and would consider to be the “right answer”.

The following is not how it works in reality, but should help you understand how these language models work in general:

The AI takes the words in your input, and searches in what context they have been used before, to determine the associations. For example, it can figure out that when you ask about sheep, it will associate with animal, farming and food.

So it then searches for associated text that is the best associated with all those meanings.

Then it searches for the most common formatting of presenting such text.

Then it rewrites the text it found tho be best associated, using formatting (and wording) of such text.

At any point it time it actually understands what it is saying. All it understands that words sheep, farming and animal are associated with an article it found that discusses planting (because farming), farm (animal). So it gives you that information re-formulated in a way suitable for text.

That’s why if you ask it “How deep do you plant sheep?” it might actually answer you that it depends on the kind of sheep and the quality of soil, but usually about 6 inches.

Again. Please note that is is not actually what happens. Whether there are any such distinct steps is something only the AI creators know. But the method of association is very real, and very used. That’s the “Deep Learning” or “[Neural Networks](https://www.youtube.com/watch?v=aircAruvnKk)” that everyone talks about, when they discuss AI.

Anonymous 0 Comments

Though im sure tech gurus will disagree, so-called AI models like ChatG and other’s do not display intelligence. Rather, they could be best thought as highly sophisticated pattern recognition filters and pattern predictors. Put simply, through processes of “machine learning” these programs have been given access to terabytes upon terabytes of data. The machine then finds patterns to predict the outcome of a query (question) or conversation (still a question to the machine itself) with varying degrees of success.

Now, as some commenters have crucially pointed out, it does not assign meaning to this data. It cannot disern what an apple IS. It doesn’t know what you MEAN whan you ask it “why is the sky blue”. Instead, the machine is looking back on poems, history books, chemistry books, for every association on the words “sky” and “blue” and seeing what’s the likeliest answer. Not necessarily even the correct answer, just the likeliest. Those two things are not exclusive (obviously).

Anonymous 0 Comments

There’s an old thought experiment called “The Chinese Room.” In it, there is a person who sits in a closed off room with a slot in the door. That person only speaks English, but they are given a magical book that contains every possible Chinese phrase, and an appropriate response to said phrase also in Chinese. The person is to receive messages in Chinese through the slot in the door, write the appropriate response, and pass the message back through the slot. To anyone passing messages in, the person on the inside would be indistinguishable from someone who was fluent in Chinese, even though they dont actually understand a single word of it.

ChatGPT and other LLMs (Large Language Models) are essentially that. It doesn’t actually _understand_ what it’s saying, it just has a “magic translator book,” that says things like “if I receive these words next to each other, respond with these words,” and “if I already said this word, there’s a 50% chance I should put this word after it.” This makes it really likely that when it rolls the dice on what it’s going to say, the words work well together, but the concept itself might be completely made up.

In order to “remember,” things, it basically has to re-process everything that was already said in order to give the appropriate response. LLMs have a limit to how much they can process at once, and since what’s already been said is constantly getting longer, eventually it gets too long to go that too far back.

Anonymous 0 Comments

ChatGPT puts together words in a familiar way. It doesn’t quite “know” things in the way you and I know things- yet. For example, if you asked an AI which had fairy tales in its training, to tell the story of the Titanic, it could easily tell the story and then end it with the words *…and they all lived happily ever after…* simply because stories in its training end that way.

Note though, that the matter of what would constitute AI sentience is not well understood at this stage.

Anonymous 0 Comments

This will likely be somewhat corrected over time. I assume it reads all information mostly uncritically, and algorithms will probably be tweaked to give more weight to more reliable sources, or to take into account rebuttals of disinformation.

Anonymous 0 Comments

Imagine just using the auto next word your phone thinks your text message is trying to say. That’s what LLM do, except with a larger dataset.