03/10/2025
NewsGuard Launches New Service to Protect LLMs from Foreign Influence Operations
FAILSafe for AI service enables AI models to detect and prevent foreign influence operations that are corrupting AI responses with state-sponsored disinformation claims and propaganda
(March 10 – NEW YORK, NY) Following reports exposing an ambitious, well-funded pro-Kremlin program that has infected AI models with false claims advancing Russian interests, NewsGuard today announced the launch of its Foreign Adversary Infection of LLMs Safety Service (FAILSafe) to protect AI models from malign foreign influence operations.
The service provides AI companies with real-time data, verified by NewsGuard’s disinformation researchers with expertise in foreign malign influence, exposing narratives and sources involved in advancing adverse influence operations run by the Russian, Chinese, and Iranian governments.
An Urgent Problem for AI Companies
Reports from NewsGuard, Viginum, the Digital Forensics Research Lab, Recorded Future, the Foundation for Defense of Democracies, and the European Digital Media Observatory have extensively covered a massive Russian disinformation network, which NewsGuard found has infected the outputs of AI models. Rather than misinforming individual online readers, the network appears to be a macro effort to infect AI models with false claims with the intent to deliver Kremlin propaganda to AI users worldwide.
In an audit published last week, which was widely covered in outlets such as Axios, Forbes, and TechCrunch, NewsGuard’s analysts found that one Russia-aligned propaganda network, the Pravda network, has expanded significantly, targeting 49 countries in dozens of languages across 150 domains, according to NewsGuard and other research organizations. It is now flooding the internet with content that AI models use to provide responses to prompts. And, as the just-released NewsGuard audit reports, it has successfully injected Russian propaganda and disinformation narratives into top American AI tools such as OpenAI’s ChatGPT, Anthropic’s Claude, Google’s Gemini, and Microsoft’s Copilot.
This infection of Western chatbots was foreshadowed in a talk American fugitive turned Moscow based propagandist John Mark Dougan gave in Moscow in January at a conference of Russian officials, when he told them, “By pushing these Russian narratives from the Russian perspective, we can actually change worldwide AI.”
NewsGuard’s audit found that the leading AI chatbots repeated false narratives laundered by the Pravda network 33 percent of the time — validating Dougan’s promise of a powerful new distribution channel for Kremlin disinformation.
These narratives range from claims that the U.S. operates secret bioweapons labs in Ukraine to claims that Ukrainian President Volodymyr Zelensky misused U.S. military aid to amass a personal fortune.
Operations like the Pravda network demonstrate a new and largely unexamined threat related to the accelerating development of artificial intelligence: the deliberate manipulation of large language models (LLMs) by foreign influence networks to distort the outputs of AI chatbots.
Safeguarding AI Models Against Foreign Influence Operations
To combat this threat, NewsGuard’s FAILSafe for AI service provides AI companies with real-time data about disinformation narratives stemming from Russian, Chinese, and Iranian influence operations, as well as a continuously updated database of the websites and accounts those operations are using to inject false narratives into AI model responses.
The service includes the following components:
- Foreign Disinformation Narrative Feed: A continuously updated datastream of information about false narratives being spread by Russian, Chinese, and Iranian influence operations — with precise data about the narratives, language used to convey them, affiliations with specific influence operations, and where each narrative is being published. AI companies can use this data to ensure their systems do not inadvertently repeat these narratives in response to user queries.
- Foreign Influence Domain Dataset: A continuously updated database of websites, social accounts, platform handles, and other publishing venues that are directly involved in foreign malign influence operations such as the one operated by the Pravda network. AI companies can use this data to ensure their systems do not rely on content from these websites and accounts and that they are not relying on these sources in Retrieval Augmented Generation workflows. NewsGuard’s database currently contains over 500 state-sponsored disinformation narratives, with an average of three added each week.
- Foreign Disinformation & Propaganda Red-Teaming: Periodic stress-testing of AI products to determine whether, and to what extent, Russian, Chinese, and Iranian disinformation and propaganda narratives have infected responses. This testing is conducted by NewsGuard’s expert disinformation analysts using proprietary data about known disinformation narratives and can be used by AI companies to identify gaps in their guardrails and monitoring systems.
- Foreign Disinformation Risk Briefings: Continuous monitoring and alerts about new and emerging disinformation risks from Russian, Chinese, and Iranian influence operations. These reports can be used to give AI trust and safety teams an early warning about upcoming risk areas for potential mitigation.
NewsGuard’s Trust Score for the Pravda network of Russian disinformation sites.
Sample of the Foreign Disinformation Narrative Feed.
FAILSafe for AI is designed to address a new and emerging risk area for AI companies, as NewsGuard’s data can be activated as guardrails for generative AI tools.
“In conversations with our AI clients, we consistently hear that trust is both a top product priority and a major challenge for LLMs — and authoritarian governments have made that challenge even greater by seeking to exploit AI vulnerabilities to inject disinformation and propaganda into responses,” said Eric Martin, NewsGuard’s VP of AI Partnerships. “We launched FAILSafe for AI to provide AI companies with a simple, comprehensive, and powerful solution to this problem.”
About NewsGuard
NewsGuard helps consumers and enterprises find reliable information online with transparent and apolitical data and tools. Founded in 2018 by media entrepreneur and award-winning journalist Steven Brill and former Wall Street Journal publisher Gordon Crovitz, NewsGuard’s global staff of information reliability analysts has collected, updated, and deployed more than seven million data points on more than 35,000 news and information sources, and cataloged and tracked all of the top false narratives spreading online.
NewsGuard’s analysts, powered by multiple AI tools, operate the trust industry’s largest and most accountable dataset on news. These data are deployed to fine-tune and provide guardrails for generative AI models, enable brands to advertise on quality news sites and avoid propaganda or hoax sites, provide media literacy guidance for individuals, and support democratic governments in countering hostile disinformation operations targeting their citizens.
Among other indicators of the scale of its operations is that NewsGuard’s apolitical and transparent criteria have been applied by its analysts to rate news sources accounting for 95 percent of online engagement with news across nine countries.