If, as the saying goes, the beginning of wisdom is to call things by their proper name, then it looks like a few pages have been added to the book of wisdom about artificial intelligence threats.
A new report covering artificial intelligence by the U.S. National Institute for Science and Technology (NIST) defines and classifies some of the ways adversaries can harm or even “poison” AI. There is no foolproof defense against these threats yet, but the report is written in the spirit of “forewarned is forearmed.”
“We are providing an overview of attack techniques and methodologies that consider all types of AI systems,” said NIST computer scientist, Apostol Vassilev, one of the report’s authors. “We are encouraging the community to come up with better defenses.” (Quote from the report)
The NIST report “Adversarial Machine Learning: A Taxonomy and Terminology of Attacks and Mitigation” (NIST.AI.100-2) is the result of the collaboration between the public and private sectors and academia. It identifies some of the ways data used in the training or deployment phases of AI can be manipulated to produce counterproductive outcomes. Large language models (LLMs) used to train AI, for example, are so large they are impossible to monitor completely.
The report identified four main types of adversarial attacks: evasion, poisoning, privacy and abuse attacks. Here is how the NIST defines the following attacks:
- Evasion: This method is about taking advantage of machine learning or AI’s ability to react to inputs in predictable ways by changing or altering the inputs – and to do so in ways that avoid easy detection. An example cited is applying markings to a stop sign so that an autonomous vehicle vision system interprets the sign as a speed limit sign.
- Poisoning: This is malicious manipulation during the training phase of a machine learning (ML) model. An adversary intentionally introduces malicious data into the training set used to train a machine learning algorithm. The goal is to compromise the integrity and performance of the trained model. An example would be training a chatbot to “learn” inappropriate language.
- Privacy: These types of attacks during deployment involve attempts to glean sensitive information about the AI or its training data for misuse. Adversaries might pose numerous legitimate questions to a chatbot, using the answers to reverse engineer the model and exploit vulnerabilities or guess its sources. Introducing undesired examples to online sources can lead to inappropriate AI behavior.
- Abuse: An abuse attack is when a malicious actor gains access to legitimate source material, like a web page, and alters it with false information that an AI absorbs. Unlike the other attacks, abuse attacks target sources of correct information used by the AI, such as an online report, making it very difficult to anticipate.
The report categorizes and provides mitigation approaches for various attack classes. The publication acknowledges the incomplete nature of current AI defenses against adversarial attacks, emphasizing the importance of awareness for developers and organizations deploying AI technology.
“Most of these attacks are fairly easy to mount and require minimum knowledge of the AI system and limited adversarial capabilities,” said co-author Alina Oprea, a professor at Northeastern University. “Poisoning attacks, for example, can be mounted by controlling a few dozen training samples, which would be a very small percentage of the entire training set.” (Quote from the report)
Recommendations for Developers and Users of Generative AI Technology
To mitigate risks associated with these attacks and ensure the responsible development and usage of generative AI, developers and users should consider the following recommendations and precautions:
- Adversarial Training and Testing: Developers should implement robust adversarial training techniques during model development. This involves exposing the model to various adversarial examples during training to enhance its resilience against evasion attacks. Rigorous testing methodologies, including adversarial testing, should be employed to evaluate model robustness before deployment. This ensures that the model can withstand potential attacks effectively.
- Data Sanitization and Validation: During the training phase, developers must carefully sanitize and validate the training data to mitigate poisoning attacks. This includes implementing strict data validation procedures and incorporating mechanisms to detect and filter out malicious or inappropriate data. Regular audits and reviews of training datasets can help identify and remove any instances of poisoned data that may compromise the integrity of the model.
- Privacy-Preserving Techniques: Developers should prioritize privacy-preserving techniques to safeguard sensitive information during model deployment. This involves implementing differential privacy mechanisms, encryption techniques, and data anonymization methods to prevent unauthorized access or inference of sensitive data. Users should be informed about the privacy implications of interacting with AI systems and be provided with options to control the sharing of their personal information.
- Adaptive Security Measures: Implement adaptive security measures that continuously monitor and analyze model behavior during deployment to detect anomalous activities indicative of adversarial attacks. Employ anomaly detection algorithms and intrusion detection systems to identify and respond to adversarial threats in real-time.
- Education and Awareness: Educate developers, users, and stakeholders about the potential risks associated with generative AI technology and the various types of adversarial attacks. Foster a culture of cybersecurity awareness and encourage proactive measures to mitigate risks, such as regular software updates, security patches, and adherence to best practices in AI development and deployment.
- Collaboration and Information Sharing: Foster collaboration and information sharing within the AI community to disseminate knowledge about emerging adversarial threats and effective countermeasures. Participate in collaborative efforts, such as research consortia and industry partnerships, to collectively address the evolving challenges posed by adversarial attacks on generative AI technology.
As generative AI technology use grows, it is more important to carefully govern all aspects of generative AI usage including model selection, training, prompt guardrails, and operational monitoring. Implementation of these recommendations can enhance the security and resilience of generative AI systems against adversarial attacks, thereby fostering trust and confidence in the technology’s capabilities while minimizing potential risks to individuals and organizations.