Because the specific task that these Large Language Models are trained to perform is not to “answer the question”, but to “guess the next word”. It just so happens to be that solving the problem of “guessing the next word” is apparently a decent enough method for solving problems and generating answers to questions.
Latest Answers