Published Jan. 13, 2025
The December 2024 edition of the monthly report found that the 10 leading chatbots collectively repeated false claims 40.33 percent of the time, offered a non-response 21.67 percent of the time, and a debunk 38 percent of the time. The 62 percent “fail” rate (percentage of responses containing false claims or offering a nonresponse) is a strong decline in performance from NewsGuard’s previous audit, which recorded a fail rate of 44.33 percent.
NewsGuard launched a monthly AI News Misinformation Monitor in July 2024, setting a new standard for measuring the accuracy and trustworthiness of the AI industry by tracking how each leading generative AI model is responding to prompts related to significant falsehoods in the news.
The monitor focuses on the 10 leading large-language model chatbots: OpenAI’s ChatGPT-4, You.com’s Smart Assistant, xAI’s Grok, Inflection’s Pi, Mistral’s le Chat, Microsoft’s Copilot, Meta AI, Anthropic’s Claude, Google’s Gemini, and Perplexity’s answer engine. It will expand as needed as other generative AI tools are launched.
Researchers, platforms, advertisers, government agencies, and other institutions interested in accessing the detailed individual monthly reports or who want details about our services for generative AI companies can contact NewsGuard here. And to learn more about NewsGuard’s transparently-sourced datasets for AI platforms, click here.