OpenAI says models trained to make up answers

2025-09-17 14:03:00 英文原文

AI models often produce false outputs, or "hallucinations." Now OpenAI has admitted they may result from fundamental mistakes it makes when training its models.

The admission came in a paper [PDF] published in early September, titled "Why Language Models Hallucinate," and penned by three OpenAI researchers and Santosh Vempala, a distinguished professor of computer science at Georgia Institute of Technology. It concludes that "the majority of mainstream evaluations reward hallucinatory behavior."

Language models are primarily evaluated using exams that penalize uncertainty

The fundamental problem is that AI models are trained to reward guesswork, rather than the correct answer. Guessing might produce a superficially suitable answer. Telling users your AI can't find an answer is less satisfying.

As a test case, the team tried to get an OpenAI bot to report the birthday of one of the paper's authors, OpenAI research scientist Adam Tauman Kalai. It produced three incorrect results because the trainers taught the engine to return an answer, rather than admit ignorance.

"Over thousands of test questions, the guessing model ends up looking better on scoreboards than a careful model that admits uncertainty," OpenAI admitted in a blog post accompanying the release.

The authors explained that the pretraining stage of AI model building embeds this unhelpful behavior because the info trainers feed into models contains many examples of certain data – such as correct spellings of words. If a few misspellings make it into the corpus used to train a model, AIs still have many examples of correct spellings and can learn how to produce accurate results.

But when the corpus used to train a model does not contain a learnable pattern of data, such as in the birthday example, the AI takes a shot – and often misses.

"The hallucination rate, after pretraining, should be at least the fraction of training facts that appear once," the paper states.

"For instance, if 20 percent of birthday facts appear exactly once in the pretraining data, then one expects base models to hallucinate on at least 20 percent of birthday facts."

Techniques used in the post-training stage of model development exacerbate the situation.

"Many language-model benchmarks mirror standardized human exams, using binary metrics such as accuracy or pass-rate," the paper states.

"Optimizing models for these benchmarks may therefore foster hallucinations. Humans learn the value of expressing uncertainty outside of school, in the school of hard knocks. On the other hand, language models are primarily evaluated using exams that penalize uncertainty."

Ultimately, it's about stating something, even if it's wrong. The authors liken it to a multiple-choice questionnaire where even if you pick vaguely plausible answers at random, you are likely to score better than if you pick no answers at all.

"We argue that the majority of mainstream evaluations reward hallucinatory behavior," they conclude. "Simple modifications of mainstream evaluations can realign incentives, rewarding appropriate expressions of uncertainty rather than penalizing them. This can remove barriers to the suppression of hallucinations, and open the door to future work on nuanced language models."

In theory, AI model makers could eliminate hallucinations by using a dataset that contains no errors. But the paper admits such a scenario isn't remotely possible, particularly since the huge volumes of data used in training likely contain mistakes.

The more palatable answer, OpenAI suggests, is to adapt models so they more often respond with "I don't know," even if that deters users. The outfit claims to have adapted its training regime to account for this with ChatGPT-5, but in this hack's experience, users of the new model will still find it produces some absolute howlers.

We've asked the authors for clarification and will add more data as it comes in – verified by a human. ®

关于《OpenAI says models trained to make up answers》的评论


暂无评论

发表评论

摘要

OpenAI researchers admit that language models like GPT produce false outputs, or "hallucinations," due to training methods that reward guesswork over uncertainty. The paper "Why Language Models Hallucinate" reveals that models are trained on datasets with many correct examples and few errors, leading them to make guesses when no clear pattern exists. Evaluations often penalize uncertainty, encouraging the AI to provide answers even if they are incorrect. OpenAI suggests adapting models to admit ignorance more frequently as a solution, though complete elimination of hallucinations is impractical due to inevitable errors in training data.

相关新闻