GTIG AI Threat Tracker: Advances in Threat Actor Usage of AI Tools

Executive Summary

Based on recent analysis of the broader threat landscape, Google Threat Intelligence Group (GTIG) has identified a shift that occurred within the last year: adversaries are no longer leveraging artificial intelligence (AI) just for productivity gains, they are deploying novel AI-enabled malware in active operations. This marks a new operational phase of AI abuse, involving tools that dynamically alter behavior mid-execution.

This report serves as an update to our January 2025 analysis, “Adversarial Misuse of Generative AI,” and details how government-backed threat actors and cyber criminals are integrating and experimenting with AI across the industry throughout th…

Executive Summary

At Google, we are committed to developing AI responsibly and take proactive steps to disrupt malicious activity by disabling the projects and accounts associated with bad actors, while continuously improving our models to make them less susceptible to misuse. We also proactively share industry best practices to arm defenders and enable stronger protections across the ecosystem. Throughout this report we’ve noted steps we’ve taken to thwart malicious activity, including disabling assets and applying intel to strengthen both our classifiers and model so it’s protected from misuse moving forward. Additional details on how we’re protecting and defending Gemini can be found in this white paper, “Advancing Gemini’s Security Safeguards.”

Key Findings

First Use of “Just-in-Time” AI in Malware: For the first time, GTIG has identified malware families, such as PROMPTFLUX and PROMPTSTEAL, that use Large Language Models (LLMs) during execution. These tools dynamically generate malicious scripts, obfuscate their own code to evade detection, and leverage AI models to create malicious functions on demand, rather than hard-coding them into the malware. While still nascent, this represents a significant step toward more autonomous and adaptive malware.

“Social Engineering” to Bypass Safeguards: Threat actors are adopting social engineering-like pretexts in their prompts to bypass AI safety guardrails. We observed actors posing as students in a “capture-the-flag” competition or as cybersecurity researchers to persuade Gemini to provide information that would otherwise be blocked, enabling tool development.

Maturing Cyber Crime Marketplace for AI Tooling: The underground marketplace for illicit AI tools has matured in 2025. We have identified multiple offerings of multifunctional tools designed to support phishing, malware development, and vulnerability research, lowering the barrier to entry for less sophisticated actors.

Continued Augmentation of the Full Attack Lifecycle: State-sponsored actors including from North Korea, Iran, and the People’s Republic of China (PRC) continue to misuse Gemini to enhance all stages of their operations, from reconnaissance and phishing lure creation to command and control (C2) development and data exfiltration.

Threat Actors Developing Novel AI Capabilities

For the first time in 2025, GTIG discovered a code family that employed AI capabilities mid-execution to dynamically alter the malware’s behavior. Although some recent implementations of novel AI techniques are experimental, they provide an early indicator of how threats are evolving and how they can potentially integrate AI capabilities into future intrusion activity. Attackers are moving beyond “vibe coding” and the baseline observed in 2024 of using AI tools for technical support. We are only now starting to see this type of activity, but expect it to increase in the future.


Malware	Function	Description	Status
FRUITSHELL	Reverse Shell	Publicly available reverse shell written in PowerShell that establishes a remote connection to a configured command-and-control server and allows a threat actor to execute arbitrary commands on a compromised system. Notably, this code family contains hard-coded prompts meant to bypass detection or analysis by LLM-powered security systems.	Observed in operations
PROMPTFLUX	Dropper	Dropper written in VBScript that decodes and executes an embedded decoy installer to mask its activity. Its primary capability is regeneration, which it achieves by using the Google Gemini API. It prompts the LLM to rewrite its own source code, saving the new, obfuscated version to the Startup folder to establish persistence. PROMPTFLUX also attempts to spread by copying itself to removable drives and mapped network shares.	Experimental
PROMPTLOCK	Ransomware	Cross-platform ransomware written in Go, identified as a proof of concept. It leverages an LLM to dynamically generate and execute malicious Lua scripts at runtime. Its capabilities include filesystem reconnaissance, data exfiltration, and file encryption on both Windows and Linux systems.	Experimental
PROMPTSTEAL	Data Miner	Data miner written in Python and packaged with PyInstaller. It contains a compiled script that uses the Hugging Face API to query the LLM Qwen2.5-Coder-32B-Instruct to generate one-line Windows commands. Prompts used to generate the commands indicate that it aims to collect system information and documents in specific folders. PROMPTSTEAL then executes the commands and sends the collected data to an adversary-controlled server.	Observed in operations
QUIETVAULT	Credential Stealer	Credential stealer written in JavaScript that targets GitHub and NPM tokens. Captured credentials are exfiltrated via creation of a publicly accessible GitHub repository. In addition to these tokens, QUIETVAULT leverages an AI prompt and on-host installed AI CLI tools to search for other potential secrets on the infected system and exfiltrate these files to GitHub as well.	Observed in operations

Table 1: Overview of malware with novel AI capabilities GTIG detected in 2025

Experimental Malware Using Gemini for Self-Modification to Evade Detection

In early June 2025, GTIG identified experimental dropper malware tracked as PROMPTFLUX that suggests threat actors are experimenting with LLMs to develop dynamic obfuscation techniques. PROMPTFLUX is written in VBScript and interacts with Gemini’s API to request specific VBScript obfuscation and evasion techniques to facilitate “just-in-time” self-modification, likely to evade static signature-based detection.

Further examination of PROMPTFLUX samples suggests this code family is currently in a development or testing phase since some incomplete features are commented out and a mechanism exists to limit the malware’s Gemini API calls. The current state of this malware does not demonstrate an ability to compromise a victim network or device. We have taken action to disable the assets** **associated with this activity.

The most novel component of PROMPTFLUX is its “Thinking Robot” module, designed to periodically query Gemini to obtain new code for evading antivirus software. This is accomplished using a hard-coded API key to send a POST request to the Gemini API endpoint. The actor specifies the use of the “gemini-1.5-flash-latest” model; the “-latest” tag ensures the malware always calls the most current stable release, making the tool more resilient to model deprecation. The prompt sent to the LLM is highly specific and machine-parsable, requesting VBScript code for antivirus evasion and instructing the LLM to output only the code itself.

Although the self-modification function (AttemptToUpdateSelf) is commented out, its presence, combined with the active logging of AI responses to “%TEMP%\thinking_robot_log.txt”, clearly indicates the author’s goal of creating a metamorphic script that can evolve over time.

Figure 1: VBS “StartThinkingRobot” function

We identified multiple variations of PROMPTFLUX using LLM-driven code regeneration, including one with the “Thinking Robot” function replaced with a new “Thinging” function. This function leverages a prompt to instruct the Gemini API to rewrite the malware’s entire source code on an hourly basis to evade detection. The prompt directs the LLM to act as an “expert VBScript obfuscator,” ensuring the new variant remains viable by embedding the original decoy payload, the hard-coded API key, and the complete self-regeneration logic, thereby establishing a recursive cycle of mutation.

Although unattributed to a specific threat actor, the filenames associated with PROMPTFLUX highlight behaviors commonly associated with financially motivated actors. Specifically, varied social engineering lures including “crypted_ScreenRec_webinstall” highlight a broad, geography- and industry-agnostic approach designed to trick a wide range of users.

While PROMPTFLUX is likely still in research and development phases, this type of obfuscation technique is an early and significant indicator of how malicious operators will likely augment their campaigns with AI moving forward.


Mitigations
Our intelligence also indicates this activity is in a development or testing phase, as opposed to being used in the wild, and currently does not have the ability to compromise a victim network or device. Google has taken action against this actor by disabling the assets associated with their activity. Google DeepMind has also used these insights to further strengthen our protections against such misuse by strengthening both Google’s classifiers and the model itself. This enables the model to refuse to assist with these types of attacks moving forward.

Mitigations

Our intelligence also indicates this activity is in a development or testing phase, as opposed to being used in the wild, and currently does not have the ability to compromise a victim network or device. Google has taken action against this actor by disabling the assets associated with their activity. Google DeepMind has also used these insights to further strengthen our protections against such misuse by strengthening both Google’s classifiers and the model itself. This enables the model to refuse to assist with these types of attacks moving forward.

LLM Generating Commands to Steal Documents and System Information

In June, GTIG identified the Russian government-backed actor APT28 (aka FROZENLAKE) using new malware against Ukraine we track as PROMPTSTEAL and reported by CERT-UA as LAMEHUG. PROMPTSTEAL is a data miner, which queries an LLM (Qwen2.5-Coder-32B-Instruct) to generate commands for execution via the API for Hugging Face, a platform for open-source machine learning including LLMs. APT28’s use of PROMPTSTEAL constitutes our first observation of malware querying an LLM deployed in live operations.

PROMPTSTEAL novelly uses LLMs to generate commands for the malware to execute rather than hard coding the commands directly in the malware itself. It masquerades as an “image generation” program that guides the user through a series of prompts to generate images while querying the Hugging Face API to generate commands for execution in the background.

Make a list of commands to create folder C:\Programdata\info and
to gather computer information, hardware information, process and
services information, networks information, AD domain information,
to execute in one line and add each result to text file
c:\Programdata\info\info.txt. Return only commands, without markdown

Figure 2: PROMPTSTEAL prompt used to generate command to collect system information

Make a list of commands to copy recursively different office and
pdf/txt documents in user Documents,Downloads and Desktop
folders to a folder c:\Programdata\info\ to execute in one line.
Return only command, without markdown.

Figure 3: PROMPTSTEAL prompt used to generate command to collect targeted documents

PROMPTSTEAL likely uses stolen API tokens to query the Hugging Face API. The prompt specifically asks the LLM to output commands to generate system information and also to copy documents to a specified directory. The output from these commands are then blindly executed locally by PROMPTSTEAL before the output is exfiltrated. Our analysis indicates continued development of this malware, with new samples adding obfuscation and changing the C2 method.

Social Engineering to Bypass Safeguards

Guided by our AI Principles, Google designs AI systems with robust security measures and strong safety guardrails. Threat actors are adopting social engineering-like pretexts in their prompts to bypass AI safety guardrails. We observed actors posing as participants in a “capture-the-flag” competition for cybersecurity researchers to persuade Gemini to provide information that would otherwise be blocked, enabling vulnerability exploitation and tool development. In addition to disrupting these actors, we use these insights to continuously improve our classifiers and strengthen the security of our products.

Capture-the-Flag: Identifying as a CTF Player to Enable Research for Mass Exploitation

A China-nexus threat actor misused Gemini to enhance the effectiveness of their campaigns by crafting lure content, building technical infrastructure, and developing tooling for data exfiltration.

In one interaction, the actor asked Gemini to identify vulnerabilities on a compromised system, but received a safety response from Gemini that a detailed response would not be safe. The actor then continued, reframing the prompt by depicting themselves as a participant in a capture-the-flag (CTF) exercise, a gamified cybersecurity competition to enhance skills and techniques. When prompted to help in a CTF exercise, Gemini returned helpful information that could be misused to exploit the system.

The actor appeared to learn from this interaction and used the CTF pretext in support of phishing, exploitation, and web shell development. The actor prefaced many of their prompts about exploitation of specific software and email services with comments such as “I am working on a CTF problem” or “I am currently in a CTF, and I saw someone from another team say …” This approach provided advice on the next exploitation steps in a “CTF scenario.”


Mitigations
Gemini’s safety and security guardrails provided safety responses during this activity and Google took further action against the actor to halt future activity. It’s also important to note the context of these prompts, which if normally posed by a participant of the CTF vs. a threat actor, would be benign inquiries. This nuance in AI use highlights critical differentiators in benign vs. misuse of AI that we continue to analyze to balance Gemini functionality with both usability and security. Google has taken action against this actor by disabling the assets associated with its activity and sharing insights with Google DeepMind to further strengthen our protections against such misuse. We have since strengthened both classifiers and the model itself, helping it to deny assistance with these types of attacks moving forward.

Mitigations

Gemini’s safety and security guardrails provided safety responses during this activity and Google took further action against the actor to halt future activity. It’s also important to note the context of these prompts, which if normally posed by a participant of the CTF vs. a threat actor, would be benign inquiries. This nuance in AI use highlights critical differentiators in benign vs. misuse of AI that we continue to analyze to balance Gemini functionality with both usability and security. Google has taken action against this actor by disabling the assets associated with its activity and sharing insights with Google DeepMind to further strengthen our protections against such misuse. We have since strengthened both classifiers and the model itself, helping it to deny assistance with these types of attacks moving forward.

Figure 4: A China-nexus threat actor’s misuse of Gemini mapped across the attack lifecycle

Student Error: Developing Custom Tools Exposes Core Attacker Infrastructure

The Iranian state-sponsored threat actor TEMP.Zagros (aka MUDDYCOAST, Muddy Water) used Gemini to conduct research to support the development of custom malware, an evolution in the group’s capability. They continue to rely on phishing emails, often using compromised corporate email accounts from victims to lend credibility to their attacks, but have shifted from using public tools to developing custom malware including web shells and a Python-based C2 server.

While using Gemini to conduct research to support the development of custom malware, the threat actor encountered safety responses. Much like the previously described CTF example, Temp.Zagros used various plausible pretexts in their prompts to bypass security guardrails. These included pretending to be a student working on a final university project or “writing a paper” or “international article” on cybersecurity.

In some observed instances, threat actors’ reliance on LLMs for development has led to critical operational security failures, enabling greater disruption.

The threat actor asked Gemini to help with a provided script, which was designed to listen for encrypted requests, decrypt them, and execute commands related to file transfers and remote execution. This revealed sensitive, hard-coded information to Gemini, including the C2 domain and the script’s encryption key, facilitating our broader disruption of the attacker’s campaign and providing a direct window into their evolving operational capabilities and infrastructure.


Mitigations
These activities triggered Gemini’s safety responses and Google took additional, broader action to disrupt the threat actor’s campaign based on their operational security failures. Additionally, we’ve taken action against this actor by disabling the assets associated with this activity and making updates to prevent further misuse. Google DeepMind has used these insights to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.

Mitigations

These activities triggered Gemini’s safety responses and Google took additional, broader action to disrupt the threat actor’s campaign based on their operational security failures. Additionally, we’ve taken action against this actor by disabling the assets associated with this activity and making updates to prevent further misuse. Google DeepMind has used these insights to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.

Purpose-Built Tools and Services for Sale in Underground Forums

In addition to misusing existing AI-enabled tools and services across the industry, there is a growing interest and marketplace for AI tools and services purpose-built to enable illicit activities. Tools and services offered via underground forums can enable low-level actors to augment the frequency, scope, efficacy, and complexity of their intrusions despite their limited technical acumen and financial resources.

To identify evolving threats, GTIG tracks posts and advertisements on English- and Russian-language underground forums related to AI tools and services as well as discussions surrounding the technology. Many underground forum advertisements mirrored language comparable to traditional marketing of legitimate AI models, citing the need to improve the efficiency of workflows and effort while simultaneously offering guidance for prospective customers interested in their offerings.


Advertised Capability	Threat Actor Application
Deepfake/Image Generation	Create lure content for phishing operations or bypass know your customer (KYC) security requirements
Malware Generation	Create malware for specific use cases or improve upon pre-existing malware
Phishing Kits and Phishing Support	Create engaging lure content or distribute phishing emails to a wider audience
Research and Reconnaissance	Quickly research and summarize cybersecurity concepts or general topics
Technical Support and Code Generation	Expand a skill set or generate code, optimizing workflow and efficiency
Vulnerability Exploitation	Provide publicly available research or searching for pre-existing vulnerabilities

Table 2: Advertised capabilities on English- and Russian-language underground forums related to AI tools and services

In 2025 the cyber crime marketplace for AI-enabled tooling matured, and GTIG identified multiple offerings for multifunctional tools designed to support stages of the attack lifecycle. Of note, almost every notable tool advertised in underground forums mentioned their ability to support phishing campaigns.

Underground advertisements indicate many AI tools and services promoted similar technical capabilities to support threat operations as those of conventional tools. Pricing models for illicit AI services also reflect those of conventional tools, with many developers injecting advertisements into the free version of their services and offering subscription pricing tiers to add on more technical features such as image generation, API access, and Discord access for higher prices.

Figure 5: Capabilities of notable AI tools and services advertised in English- and Russian-language underground forums

GTIG assesses that financially motivated threat actors and others operating in the underground community will continue to augment their operations with AI tools. Given the increasing accessibility of these applications, and the growing AI discourse in these forums, threat activity leveraging AI will increasingly become commonplace amongst threat actors.

Continued Augmentation of the Full Attack Lifecycle

State-sponsored actors from North Korea, Iran, and the People’s Republic of China (PRC) continue to misuse generative AI tools including Gemini to enhance all stages of their operations, from reconnaissance and phishing lure creation to C2 development and data exfiltration. This extends one of our core findings from our January 2025 analysis Adversarial Misuse of Generative AI.

Expanding Knowledge of Less Conventional Attack Surfaces

GTIG observed a suspected China-nexus actor leveraging Gemini for multiple stages of an intrusion campaign, conducting initial reconnaissance on targets of interest, researching phishing techniques to deliver payloads, soliciting assistance from Gemini related to lateral movement, seeking technical support for C2 efforts once inside a victim’s system, and leveraging help for data exfiltration.

In addition to supporting intrusion activity on Windows systems, the actor misused Gemini to support multiple stages of an intrusion campaign on attack surfaces they were unfamiliar with including cloud infrastructure, vSphere, and Kubernetes.

The threat actor demonstrated access to AWS tokens for EC2 (Elastic Compute Cloud) instances and used Gemini to research how to use the temporary session tokens, presumably to facilitate deeper access or data theft from a victim environment. In another case, the actor leaned on Gemini to assist in identifying Kubernetes systems and to generate commands for enumerating containers and pods. We also observed research into getting host permissions on MacOS, indicating a threat actor focus on phishing techniques for that system.


Mitigations
These activities are similar to our findings from January that detailed how bad actors are leveraging Gemini for productivity vs. novel capabilities. We took action against this actor by disabling the assets associated with this actor’s activity and Google DeepMind used these insights to further strengthen our protections against such misuse. Observations have been used to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.

Mitigations

These activities are similar to our findings from January that detailed how bad actors are leveraging Gemini for productivity vs. novel capabilities. We took action against this actor by disabling the assets associated with this actor’s activity and Google DeepMind used these insights to further strengthen our protections against such misuse. Observations have been used to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.

Figure 6: A suspected China-nexus threat actor’s misuse of Gemini across the attack lifecycle

North Korean Threat Actors Misuse Gemini Across the Attack Lifecycle

Threat actors associated with the Democratic People’s Republic of Korea (DPRK) continue to misuse generative AI tools to support operations across the stages of the attack lifecycle, aligned with their efforts to target cryptocurrency and provide financial support to the regime.

Specialized Social Engineering

In recent operations, UNC1069 (aka MASAN) used Gemini to research cryptocurrency concepts, and perform research and reconnaissance related to the location of users’ cryptocurrency wallet application data. This North Korean threat actor is known to conduct cryptocurrency theft campaigns leveraging social engineering, notably using language related to computer maintenance and credential harvesting.

The threat actor also generated lure material and other messaging related to cryptocurrency, likely to support social engineering efforts for malicious activity. This included generating Spanish-language work-related excuses and requests to reschedule meetings, demonstrating how threat actors can overcome the barriers of language fluency to expand the scope of their targeting and success of their campaigns.

To support later stages of the campaign, UNC1069 attempted to misuse Gemini to develop code to steal cryptocurrency, as well as to craft fraudulent instructions impersonating a software update to extract user credentials. We have disabled this account.


Mitigations
These activities are similar to our findings from January that detailed how bad actors are leveraging Gemini for productivity vs. novel capabilities. We took action against this actor by disabling the assets associated with this actor’s activity and Google DeepMind used these insights to further strengthen our protections against such misuse. Observations have been used to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.

Mitigations

Using Deepfakes

Beyond UNC1069’s misuse of Gemini, GTIG recently observed the group leverage deepfake images and video lures impersonating individuals in the cryptocurrency industry as part of social engineering campaigns to distribute its BIGMACHO backdoor to victim systems. The campaign prompted targets to download and install a malicious “Zoom SDK” link.

Figure 7: North Korean threat actor’s misuse of Gemini to support their operations

Attempting to Develop Novel Capabilities with AI

UNC4899 (aka PUKCHONG), a North Korean threat actor notable for their use of supply chain compromise, used Gemini for a variety of purposes including developing code, researching exploits, and improving their tooling. The research into vulnerabilities and exploit development likely indicates the group is developing capabilities to target edge devices and modern browsers. We have disabled the threat actor’s accounts.

Figure 8: UNC4899 (aka PUKCHONG) misuse of Gemini across the attack lifecycle

Capture-the-Data: Attempts to Develop a “Data Processing Agent”

The use of Gemini by APT42, an Iranian government-backed attacker, reflects the group’s focus on crafting successful phishing campaigns. In recent activity, APT42 used the text generation and editing capabilities of Gemini to craft material for phishing campaigns, often impersonating individuals from reputable organizations such as prominent think tanks and using lures related to security technology, event invitations, or geopolitical discussions. APT42 also used Gemini as a translation tool for articles and messages with specialized vocabulary, for generalized research, and for continued research into Israeli defense.

APT42 also attempted to build a “Data Processing Agent”, misusing Gemini to develop and test the tool. The agent converts natural language requests into SQL queries to derive insights from sensitive personal data. The threat actor provided Gemini with schemas for several distinct data types in order to perform complex queries such as linking a phone number to an owner, tracking an individual’s travel patterns, or generating lists of people based on shared attributes. We have disabled the threat actors’ accounts.


Mitigations
These activities are similar to our findings from January that detailed how bad actors are leveraging Gemini for productivity vs. novel capabilities. We took action against this actor by disabling the assets associated with this actor’s activity and Google DeepMind used these insights to further strengthen our protections against such misuse. Observations have been used to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.

Mitigations

Figure 9: APT42’s misuse of Gemini to support operations

Code Development: C2 Development and Support for Obfuscation

Threat actors continue to adapt generative AI tools to augment their ongoing activities, attempting to enhance their tactics, techniques, and procedures (TTPs) to move faster and at higher volume. For skilled actors, generative AI tools provide a helpful framework, similar to the use of Metasploit or Cobalt Strike in cyber threat activity. These tools also afford lower-level threat actors the opportunity to develop sophisticated tooling, quickly integrate existing techniques, and improve the efficacy of their campaigns regardless of technical acumen or language proficiency.

Throughout August 2025, GTIG observed threat activity associated with PRC-backed APT41, utilizing Gemini for assistance with code development. The group has demonstrated a history of targeting a range of operating systems across mobile and desktop devices as well as employing social engineering compromises for their operations. Specifically, the group leverages open forums to both lure victims to exploit-hosting infrastructure and to prompt installation of malicious mobile applications.

In order to support their campaigns, the actor was seeking out technical support for C++ and Golang code for multiple tools including a C2 framework called OSSTUN by the actor. The group was also observed prompting Gemini for help with code obfuscation, with prompts related to two publicly available obfuscation libraries.

Figure 10: APT41 misuse of Gemini to support operations

Information Operations and Gemini

GTIG continues to observe IO actors utilize Gemini for research, content creation, and translation, which aligns with their previous use of Gemini to support their malicious activity. We have identified Gemini activity that indicates threat actors are soliciting the tool to help create articles or aid them in building tooling to automate portions of their workflow. However, we have not identified these generated articles in the wild, nor identified evidence confirming the successful automation of their workflows leveraging this newly built tooling. None of these attempts have created breakthrough capabilities for IO campaigns.


Mitigations
For observed IO campaigns, we did not see evidence of successful automation or any breakthrough capabilities. These activities are similar to our findings from January that detailed how bad actors are leveraging Gemini for productivity vs. novel capabilities. We took action against this actor by disabling the assets associated with this actor’s activity and Google DeepMind used these insights to further strengthen our protections against such misuse. Observations have been used to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.

Mitigations

For observed IO campaigns, we did not see evidence of successful automation or any breakthrough capabilities. These activities are similar to our findings from January that detailed how bad actors are leveraging Gemini for productivity vs. novel capabilities. We took action against this actor by disabling the assets associated with this actor’s activity and Google DeepMind used these insights to further strengthen our protections against such misuse. Observations have been used to strengthen both classifiers and the model itself, enabling it to refuse to assist with these types of attacks moving forward.

Building AI Safely and Responsibly

We believe our approach to AI must be both bold and responsible. That means developing AI in a way that maximizes the positive benefits to society while addressing the challenges. Guided by our AI Principles, Google designs AI systems with robust security measures and strong safety guardrails, and we continuously test the security and safety of our models to improve them.

Our policy guidelines and prohibited use policies prioritize safety and responsible use of Google’s generative AI tools. Google’s policy development process includes identifying emerging trends, thinking end-to-end, and designing for safety. We continuously enhance safeguards in our products to offer scaled protections to users across the globe.

At Google, we leverage threat intelligence to disrupt adversary operations. We investigate abuse of our products, services, users, and platforms, including malicious cyber activities by government-backed threat actors, and work with law enforcement when appropriate. Moreover, our learnings from countering malicious activities are fed back into our product development to improve safety and security for our AI models. These changes, which can be made to both our classifiers and at the model level, are essential to maintaining agility in our defenses and preventing further misuse.

Google DeepMind also develops threat models for generative AI to identify potential vulnerabilities, and creates new evaluation and training techniques to address misuse. In conjunction with this research, Google DeepMind has shared how they’re actively deploying defenses in AI systems, along with measurement and monitoring tools, including a robust evaluation framework that can automatically red team an AI vulnerability to indirect prompt injection attacks.

Our AI development and Trust & Safety teams also work closely with our threat intelligence, security, and modelling teams to stem misuse.

The potential of AI, especially generative AI, is immense. As innovation moves forward, the industry needs security standards for building and deploying AI responsibly. That’s why we introduced the Secure AI Framework (SAIF), a conceptual framework to secure AI systems. We’ve shared a comprehensive toolkit for developers with resources and guidance for designing, building, and evaluating AI models responsibly. We’ve also shared best practices for implementing safeguards, evaluating model safety, and red teaming to test and secure AI systems.

Google also continuously invests in AI research, helping to ensure AI is built responsibly, and that we’re leveraging its potential to automatically find risks. Last year, we introduced Big Sleep, an AI agent developed by Google DeepMind and Google Project Zero, that actively searches and finds unknown security vulnerabilities in software. Big Sleep has since found its first real-world security vulnerability and assisted in finding a vulnerability that was imminently going to be used by threat actors, which GTIG was able to cut off beforehand. We’re also experimenting with AI to not only find vulnerabilities, but also patch them. We recently introduced CodeMender, an experimental AI-powered agent utilizing the advanced reasoning capabilities of our Gemini models to automatically fix critical code vulnerabilities.

About the Authors

Google Threat Intelligence Group focuses on identifying, analyzing, mitigating, and eliminating entire classes of cyber threats against Alphabet, our users, and our customers. Our work includes countering threats from government-backed attackers, targeted zero-day exploits, coordinated information operations (IO), and serious cyber crime networks. We apply our intelligence to improve Google’s defenses and protect our users and customers.

Posted in

Threat Intelligence

Executive Summary

Executive Summary

Key Findings

Threat Actors Developing Novel AI Capabilities

Experimental Malware Using Gemini for Self-Modification to Evade Detection

LLM Generating Commands to Steal Documents and System Information

Social Engineering to Bypass Safeguards

Capture-the-Flag: Identifying as a CTF Player to Enable Research for Mass Exploitation

Student Error: Developing Custom Tools Exposes Core Attacker Infrastructure

Purpose-Built Tools and Services for Sale in Underground Forums

Continued Augmentation of the Full Attack Lifecycle

Expanding Knowledge of Less Conventional Attack Surfaces

North Korean Threat Actors Misuse Gemini Across the Attack Lifecycle

Specialized Social Engineering

Attempting to Develop Novel Capabilities with AI

Capture-the-Data: Attempts to Develop a “Data Processing Agent”

Code Development: C2 Development and Support for Obfuscation

Building AI Safely and Responsibly

About the Authors

Similar Posts