Why can’t LLM’s like ChatGPT calculate a confidence score when providing an answer to your question and simply reply “I don’t know” instead of hallucinating an answer?

1.34K viewsOtherTechnology

It seems like they all happily make up a completely incorrect answer and never simply say “I don’t know”. It seems like hallucinated answers come when there’s not a lot of information to train them on a topic. Why can’t the model recognize the low amount of training data and generate with a confidence score to determine if they’re making stuff up?

EDIT: Many people point out rightly that the LLMs themselves can’t “understand” their own response and therefore cannot determine if their answers are made up. But I guess the question includes the fact that chat services like ChatGPT already have support services like the Moderation API that evaluate the content of your query and it’s own responses for content moderation purposes, and intervene when the content violates their terms of use. So couldn’t you have another service that evaluates the LLM response for a confidence score to make this work? Perhaps I should have said “LLM chat services” instead of just LLM, but alas, I did not.

In: Technology

25 Answers

Anonymous 0 Comments

They can and some do, there are two main approaches, one focuses on model explianability and the other focuses on more classical confidence scoring that e.g. standard classifiers have usually via techniques such as reflection.

This is usually done on a system level, however you can also extract token probability distributions from most models but you usually won’t be able to use them directly to produce an overall “confidence score”.

That said you usually shouldn’t expect to see any of that details if you only consume the model via an API. You do not want to provide metrics of this detail since they can employed for certain attacks against models, including extraction and dataset inclusion disclosures.

As far as the “I don’t know part” you can definitely fine tune an LLM to do that, however it’s usefulness in most settings would then drastically decrease.

Hallucinations are actually quite useful, it’s quite likely that our own cognitive process does the same we tend to fill gaps and recall incorrect facts all the time.

Tuning hallucinations out seems to drastically reduce the performance of these models in zero-shot settings which are highly important for real world applications.

You are viewing 1 out of 25 answers, click here to view all answers.