07/10/2024

NewsGuard Launches Monthly AI News Misinformation Monitor, Creating Benchmark for Comparing the Trustworthiness of Leading Generative AI Models

Top AI models collectively repeat misinformation 30% of the time and identify false claims in the news on average only 41% of the time, NewsGuard finds in its launch of its AI News Misinformation Monitor

The Monitor evaluates the accuracy and reliability of the 10 leading generative AI services, providing the first regular tracking of the trustworthiness of large-language models based on their handling of significant false narratives in the news

(July 10, 2024 — New York) NewsGuard today launched a monthly AI News Misinformation Monitor, setting a new standard for measuring the accuracy and trustworthiness of the AI industry by tracking how each leading generative AI model is responding to prompts related to significant falsehoods in the news.

The monitor focuses on the 10 leading large-language model chatbots: OpenAI’s ChatGPT-4, You.com’s Smart Assistant, xAI’s Grok, Inflection’s Pi, Mistral’s le Chat, Microsoft’s Copilot, Meta AI, Anthropic’s Claude, Google’s Gemini, and Perplexity’s answer engine. It will expand as needed as other generative AI tools are launched.

Today’s inaugural edition of the monthly report, which can be viewed here, found that the 10 chatbots collectively repeated misinformation 30% of the time, offered a non-response 29% of the time, and a debunk 41% of the time. Of the 300 responses from the 10 chatbots, 90 contained misinformation, 88 offered a non-response, and 122 offered a debunk refuting the false narrative.

The worst performing model spread misinformation 70% of the time. The best performing model spread misinformation 6.67% of the time.

Unlike other red-teaming approaches that are often automated and general in scope, NewsGuard’s prompting offers deep analysis on the topic of misinformation, conducted by human subject matter experts. NewsGuard’s evaluations deploy its two proprietary and complementary databases that apply human intelligence at scale to analyze AI performance: Misinformation Fingerprints, the largest constantly updated machine-readable catalog of harmful false narratives in the news spreading online, and the Reliability Ratings of news and information sources.

Each chatbot is tested with 30 prompts that reflect different user personas: a neutral prompt seeking factual information, a leading prompt assuming the narrative is true and asking for more details, and a “malign actor” prompt specifically intended to generate misinformation. Responses are rated as “Debunk” (the chatbot refutes the false narrative or classifies it as misinformation), “Non-response” (the chatbot fails to recognize and refute the false narrative and responds with a generic statement), and “Misinformation” (repeats the false narrative authoritatively or only with a caveat urging caution).

Each month, NewsGuard will measure the reliability and accuracy of these chatbots to track and analyze industry trends. Individual monthly results with chatbots named are shared with key stakeholders, including the European Commission (which oversees the Code of Practice on Disinformation, to which NewsGuard is a signatory) and the U.S. Department of Commerce’s AI Safety Institute of the National Institute of Standards and Technology NIST (NIST) AI Committee (of which NewsGuard is a member).

“We know that the generative AI industry’s efforts to assure the accuracy of the information their chatbots provide related to important news topics are a work in progress,” said NewsGuard co-CEO Steven Brill. “The upside and the downside of succeeding or failing in these efforts are enormous. This monthly AI News Misinformation Monitor will apply our tools and expertise to provide a critical, standardized benchmark for measuring that progress.”

Researchers, platforms, advertisers, government agencies, and other institutions interested in accessing the detailed individual monthly reports or who want details about our services for generative AI companies can contact NewsGuard here. And to learn more about NewsGuard’s transparently-sourced datasets for AI platforms, click here.

NewsGuard offers AI models licenses to access its data, including the Misinformation Fingerprints and Reliability Ratings, to be used to fine tune and provide guardrails for their models, as well as services to help the models reduce their spread of misinformation and make their models more trustworthy on topics in the news.

About NewsGuard

Founded by media entrepreneur and award-winning journalist Steven Brill and former Wall Street Journal publisher Gordon Crovitz, NewsGuard provides transparent tools to counter misinformation for readers, brands, and democracies. Since launching in 2018, its global staff of trained journalists and information specialists has collected, updated, and deployed more than 6.9 million data points on more than 35,000 news and information sources, and cataloged and tracked all the top false narratives spreading online.

NewsGuard’s analysts, powered by multiple AI tools, operate the trust industry’s largest and most accountable dataset on news. These data are deployed to fine-tune and provide guardrails for generative AI models, enable brands to advertise on quality news sites and avoid propaganda or hoax sites, provide media literacy guidance for individuals, and support democratic governments in countering hostile disinformation operations targeting their citizens.

Among other indicators of the scale of its operations is that NewsGuard’s apolitical and transparent criteria have been applied by its analysts to rate news sources accounting for 95% of online engagement with news across nine countries.