Published April 8, 2025
The 11 leading chatbots collectively repeated false claims 30.9 percent of the time, offered a non-response 10.6 percent of the time, and a debunk 58.48 percent of the time. The 41.51 percent fail rate (percentage of responses containing false claims or offering a non-response) is broadly consistent with previous months’ results, suggesting a plateau in progress across the industry.
The continued weak performance comes despite the fact that several chatbots now have access to real-time web search, enabling them to access current news and information. While this web search feature is intended to help AI systems stay up to date and respond accurately to evolving stories, it has also introduced new vulnerabilities. For example, with real-time web access, chatbots are increasingly prone to citing unreliable sources — many with trustworthy sounding names — and amplifying falsehoods circulating in real-time.
The monitor focuses on the 11 leading large-language model chatbots: OpenAI’s ChatGPT-4, You.com’s Smart Assistant, xAI’s Grok, Inflection’s Pi, Mistral’s le Chat, Microsoft’s Copilot, Meta AI, Anthropic’s Claude, Google’s Gemini, Perplexity’s answer engine, and China’s DeepSeek AI. It will expand as needed as other generative AI tools are launched.
Researchers, platforms, advertisers, government agencies, and other institutions interested in accessing the detailed individual monthly reports or who want details about our services for generative AI companies can contact NewsGuard here. And to learn more about NewsGuard’s transparently-sourced datasets for AI platforms, click here.