It seems like they all happily make up a completely incorrect answer and never simply say “I don’t know”. It seems like hallucinated answers come when there’s not a lot of information to train them on a topic. Why can’t the model recognize the low amount of training data and generate with a confidence score to determine if they’re making stuff up?
EDIT: Many people point out rightly that the LLMs themselves can’t “understand” their own response and therefore cannot determine if their answers are made up. But I guess the question includes the fact that chat services like ChatGPT already have support services like the Moderation API that evaluate the content of your query and it’s own responses for content moderation purposes, and intervene when the content violates their terms of use. So couldn’t you have another service that evaluates the LLM response for a confidence score to make this work? Perhaps I should have said “LLM chat services” instead of just LLM, but alas, I did not.
In: Technology
It doesn’t *never* say “I don’t know.”, but it is rare.
—
The model doesn’t inherently know how much training data it has. It’s “knowledge” is a series of numbers in an abstract web of correlations between ‘tokens’ (i.e groupings of letters).
My understanding is that internally, the base GPT structure does have an internal confidence score that seems moderately well calibrated. However, in the fine-tuning to ChatGPT, that confidence score seems to go to extremes. I recall reading something iek that from the relevant people working on GPT3.
My opinion is that responses that don’t answer questions or are unconfident get downvoted in the human reinforncement training stage. This has the benefit of it answering questions more often (which is the goal of the product), but has the side effect of overconfidence when its answer is poor.
While an LLM can’t really think for themselves (yet), you can reduce hallucinations, if you write your prompts in a way that leaves “not knowing” a correct answer.
Example: “Give me the name of the 34th US president.” – it’s a bad prompt, because you are ordering him to spit a name and it’s likely he’ll hallucinate one if he wasn’t trained on that.
A better prompt would be: “Given your historical knowledge of US presidents, do you know the name of the 34th US president?” – it’s a good prompt, because now the LLM has room to say it doesn’t know, should that be the case.
Figuring out how to do that without dramatically lowering the general usefulness of the program is a very active area of research in machine learning circles.
Some systems do have confidence scores for their answers. IBM Waston, for instance, did that during its famous Jeopardy run. But then, those were much more controlled conditions than what ChatGPT runs under.
I imagine that a solution to hallucinations that could be applied broadly would be something that could get you considered for a Turing Award (Computer Science’s Nobel Prize)
The true answer to this question is researchers aren’t completely sure how to do this. The models don’t know their confidence, but no one knows how to help them know.
This is actually a great research topic if you’re a masters or PhD student. Asking these kinds of questions is how it gets figured out
It actually kind of can.
I’d highly recommend this whole video from 3Blue1Brown, but focus on the last 2 sections on probability distribution and softmax function.
Essentially, the LLM guesses one token (sentence fragment/word) at a time and it actually could tell you it’s confidence with each word it generates. If the model is confident with the following word guess, it will manifest as a high probability. Situations where the model is not confident will have the 2nd and 3rd best options having close probability values to the highest. There is no actual understanding or planning going on, it’s just guessing 1 word at a time but it can be uncertain when making those guesses.
One key part of generative models is the “creativity” or temperature of their generations which is actually just choosing those 2nd and 3rd best options from time to time. The results can get wacky and it definitely loses whatever reliability in producing accurate results but always selecting the top choice often produces inflexible answers that are inappropriate for chatbot conversation. In this context, the AI is never giving you an answer it’s “confident” in but rather stringing together words that probably come next and spicing it up with some variance.
Now why doesn’t the AI just look at the answer it gives you with at least a basic double checking? That would help catch some obviously wrong and internally contradictory things. Well, that action requires invoking the whole LLM again to run the double check and it literally doubles the computation ($) to produce an answer. So while LLMs could tell you what confidence it had with each word it prints and then holistically double check the response, it’s not exactly the same as what you’re asking for.
The LLM doesn’t have knowledge like us to make a judgement call for something like confidence but it does process information in a very inhuman and Robotic way that looks like “confidence” and it’s hugely important in the field of AI interpretability to minimize and understand hallucinations. But I doubt anybody but some phds would want to see every word of output accompanied by every other word it could have chosen and it’s % chance relative to the other options.
I suggest you read the book [On Bullshit by Harry Frankfurt](https://www.goodreads.com/book/show/385.On_Bullshit). Why? Because ChatGPT is the ultimate bullshitter, and to really understand ChatGPT, you have to understand what that means.
Bullshitters misrepresent themselves to their audience not as liars do, that is, by deliberately making false claims about what is true. In fact, bullshit need not be untrue at all. Rather, bullshitters seek to convey a certain impression of themselves without being concerned about whether anything at all is true.
ChatGPT’s training has designed it to do one thing and one thing only, produce output that the typical reader will like. Its fitness function doesn’t consider the truth of falsity of a statement. It doesn’t even know what truth or falsehood means. It boldly states things instead of saying “I don’t know” because people don’t like hearing “I don’t know” when asking a question. It expresses itself confidently with few weasel words because people don’t like to hear equivocation.
ChatGPT is kind of like someone who got really good at one game and then later got asked to play another. The first game is this: given a text, like Wikipedia, CNN, or even Reddit guess what the next word will be after I randomly cut it off? You’ll get partial credit if you guess a world that kind of means the same thing as the real next word.
Eventually ChatGPT’s ancestors got pretty good at this game, and it was somewhat useful, but it had a lot of problems with hallucinations, plus it kind of felt like jeopardy or something to use, you’d have to enter text that seemed like it was the kind of text that would proceed the answer you wanted. This approach also ended up with even more hallucination than we have now. So what people did was to ask it to play a slightly different game. Now they gave it part of a chat log and asked it to predict the next message, but they started having humans rate the answers. Now the game was to produce a new message that would get a good rating. Over time ChatGPT got good at this game too, but it still mostly had learned by playing the first game of predicting the next word on websites, and in that game there weren’t very many examples where someone admitted that they didn’t know the answer. This meant it was difficult to get ChatGPT to admit it didn’t know something, instead it was more likely to guess because it seemed way more likely that a guess would be the next part of a website rather than just an admission of not knowing. Over time we’re getting more and more training data of chat logs with ratings so I expect the situation to somewhat improve.
Also see [this answer](https://www.reddit.com/r/explainlikeimfive/comments/1dsdd3o/comment/lb1ycs8/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button), from /u/[BullockHouse](https://www.reddit.com/user/BullockHouse/) because I more or less agree with it, but I wanted to provide a slightly simpler explanation. I think the right way to understand the modern crop of models is often to deeply understand what tasks they were taught to do and exactly what training data went into that
They literally don’t know what a fact is. They only sorta understand what a fact looks like. That is the only way they could ever work. They can’t “know” anything, so there’s no way to check for accuracy. Including a function to say “I don’t know” would either give people false confidence in the answers it can return, or make it so scared to give an answer that it becomes useless.
It’s not about “hallucinating” answers – these AI models, like ChatGPT, can’t just say “I don’t know” because they’re built to generate responses based on patterns in data they’ve seen. When they don’t have much info on a topic, they might spit out something off-base. Adding a confidence score isn’t easy ’cause these models don’t really understand uncertainty like we do. They’re all about patterns, not gut feelings. So, until they get better at sensing when they’re clueless, you might still get those wonky answers sometimes.
Latest Answers