Why can’t LLM’s like ChatGPT calculate a confidence score when providing an answer to your question and simply reply “I don’t know” instead of hallucinating an answer?

241 viewsOtherTechnology

It seems like they all happily make up a completely incorrect answer and never simply say “I don’t know”. It seems like hallucinated answers come when there’s not a lot of information to train them on a topic. Why can’t the model recognize the low amount of training data and generate with a confidence score to determine if they’re making stuff up?

EDIT: Many people point out rightly that the LLMs themselves can’t “understand” their own response and therefore cannot determine if their answers are made up. But I guess the question includes the fact that chat services like ChatGPT already have support services like the Moderation API that evaluate the content of your query and it’s own responses for content moderation purposes, and intervene when the content violates their terms of use. So couldn’t you have another service that evaluates the LLM response for a confidence score to make this work? Perhaps I should have said “LLM chat services” instead of just LLM, but alas, I did not.

In: Technology

25 Answers

Anonymous 0 Comments

ChatGPT is kind of like someone who got really good at one game and then later got asked to play another. The first game is this: given a text, like Wikipedia, CNN, or even Reddit guess what the next word will be after I randomly cut it off? You’ll get partial credit if you guess a world that kind of means the same thing as the real next word.

Eventually ChatGPT’s ancestors got pretty good at this game, and it was somewhat useful, but it had a lot of problems with hallucinations, plus it kind of felt like jeopardy or something to use, you’d have to enter text that seemed like it was the kind of text that would proceed the answer you wanted. This approach also ended up with even more hallucination than we have now. So what people did was to ask it to play a slightly different game. Now they gave it part of a chat log and asked it to predict the next message, but they started having humans rate the answers. Now the game was to produce a new message that would get a good rating. Over time ChatGPT got good at this game too, but it still mostly had learned by playing the first game of predicting the next word on websites, and in that game there weren’t very many examples where someone admitted that they didn’t know the answer. This meant it was difficult to get ChatGPT to admit it didn’t know something, instead it was more likely to guess because it seemed way more likely that a guess would be the next part of a website rather than just an admission of not knowing. Over time we’re getting more and more training data of chat logs with ratings so I expect the situation to somewhat improve.

Also see [this answer](https://www.reddit.com/r/explainlikeimfive/comments/1dsdd3o/comment/lb1ycs8/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button), from /u/[BullockHouse](https://www.reddit.com/user/BullockHouse/) because I more or less agree with it, but I wanted to provide a slightly simpler explanation. I think the right way to understand the modern crop of models is often to deeply understand what tasks they were taught to do and exactly what training data went into that

You are viewing 1 out of 25 answers, click here to view all answers.