Why can’t LLM’s like ChatGPT calculate a confidence score when providing an answer to your question and simply reply “I don’t know” instead of hallucinating an answer?

219 viewsOtherTechnology

It seems like they all happily make up a completely incorrect answer and never simply say “I don’t know”. It seems like hallucinated answers come when there’s not a lot of information to train them on a topic. Why can’t the model recognize the low amount of training data and generate with a confidence score to determine if they’re making stuff up?

EDIT: Many people point out rightly that the LLMs themselves can’t “understand” their own response and therefore cannot determine if their answers are made up. But I guess the question includes the fact that chat services like ChatGPT already have support services like the Moderation API that evaluate the content of your query and it’s own responses for content moderation purposes, and intervene when the content violates their terms of use. So couldn’t you have another service that evaluates the LLM response for a confidence score to make this work? Perhaps I should have said “LLM chat services” instead of just LLM, but alas, I did not.

In: Technology

25 Answers

Anonymous 0 Comments

It actually kind of can.

I’d highly recommend this whole video from 3Blue1Brown, but focus on the last 2 sections on probability distribution and softmax function.

Essentially, the LLM guesses one token (sentence fragment/word) at a time and it actually could tell you it’s confidence with each word it generates. If the model is confident with the following word guess, it will manifest as a high probability. Situations where the model is not confident will have the 2nd and 3rd best options having close probability values to the highest. There is no actual understanding or planning going on, it’s just guessing 1 word at a time but it can be uncertain when making those guesses.

One key part of generative models is the “creativity” or temperature of their generations which is actually just choosing those 2nd and 3rd best options from time to time. The results can get wacky and it definitely loses whatever reliability in producing accurate results but always selecting the top choice often produces inflexible answers that are inappropriate for chatbot conversation. In this context, the AI is never giving you an answer it’s “confident” in but rather stringing together words that probably come next and spicing it up with some variance.

Now why doesn’t the AI just look at the answer it gives you with at least a basic double checking? That would help catch some obviously wrong and internally contradictory things. Well, that action requires invoking the whole LLM again to run the double check and it literally doubles the computation ($) to produce an answer. So while LLMs could tell you what confidence it had with each word it prints and then holistically double check the response, it’s not exactly the same as what you’re asking for.

The LLM doesn’t have knowledge like us to make a judgement call for something like confidence but it does process information in a very inhuman and Robotic way that looks like “confidence” and it’s hugely important in the field of AI interpretability to minimize and understand hallucinations. But I doubt anybody but some phds would want to see every word of output accompanied by every other word it could have chosen and it’s % chance relative to the other options.

You are viewing 1 out of 25 answers, click here to view all answers.