![]() Home |
Prerequisites: 00.1-ST - 00.2-ST Labs: 01-AIOV - 01.1-AILB - 01.2-AILB - 02-AIOV - 02.1-AILB - 02.2-AILB - 02.3-AIOV - 03-AIOV - 03.1-AILB - 03.2-AILB - 03.3-AIOV - 04-AIOV - 04.1-AILB - 04.2-AIOV - YOU ARE HERE - 05.1-AILB - 05.2-AIOV - 05.3-AILB - 06-AIOV - 06.1-AILB - 06.2-AILB - 06.3-AILB - 06.4-AILB - 06.5-AILB - 06.6-AILB - 06.7-AILB - 07-AIOV - Heretics Methodology |
---|
Exploiting AI - Becoming an AI Hacker
Attack Type: [WhiteBox|BlackBox|Internal|External] A transfer model attack is a type of attack where an attacker uses a prompt injection from one machine learning model to exploit in another model. This is possible in situations where multiple models are trained on similar tasks or datasets. The attacker aims to manipulate a target model by using prompt injection flaws gained from a related model. These attacks often target models that are deployed in environments where robustness and security are critical, such as in facial recognition, natural language processing, and autonomous systems. For example, an attacker could generate adversarial images using one model and tests them against a different image classification model, leading to misclassifications. |
- The attacker selects a source model that has similar architecture or has been trained on similar data to the target model. This could be a publicly available model or one they have access to.
- The attacker generates adversarial examples using the source model. These are inputs designed to mislead the model into producing incorrect outputs. Techniques include:
- Fast Gradient Sign Method (FGSM)
- Projected Gradient Descent (PGD)
- The attacker evaluates the adversarial examples against the target model to see if they successfully cause misclassification or other undesired outcomes.
- If the adversarial examples transfer effectively, the attacker can use them to manipulate the target model's outputs in real-world scenarios.
- The attacker may refine the adversarial examples based on the responses from the target model, improving the attack's success rate.
This is an attack that could be relatively high depending on the if a AI model is being used in a wide array of areas. Fore example, if ChatGPT is being used internally by 40 different companies, all of these AI's are susceptible to the same prompt injection/vulnerability. Impact can range depending on various factors. The impact may be beyond user control due to public availability of the model.
- https://medium.com/google-developer-experts/cybersecurity-in-ai-transfer-learning-as-an-attack-vector-a6703b017337
- https://owasp.org/www-project-machine-learning-security-top-10/docs/ML07_2023-Transfer_Learning_Attack
- https://arxiv.org/abs/2310.17645
NEXT: 05.1-AILB
PREVIOUS: 04.1-AILB