Analyzing transferability of adversarial attacks across neural architectures
Chowdhury, Anika (2025-06-12)
Chowdhury, Anika
A. Chowdhury
12.06.2025
© 2025 Anika Chowdhury. Ellei toisin mainita, uudelleenkäyttö on sallittu Creative Commons Attribution 4.0 International (CC-BY 4.0) -lisenssillä (https://creativecommons.org/licenses/by/4.0/). Uudelleenkäyttö on sallittua edellyttäen, että lähde mainitaan asianmukaisesti ja mahdolliset muutokset merkitään. Sellaisten osien käyttö tai jäljentäminen, jotka eivät ole tekijän tai tekijöiden omaisuutta, saattaa edellyttää lupaa suoraan asianomaisilta oikeudenhaltijoilta.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:oulu-202506124403
https://urn.fi/URN:NBN:fi:oulu-202506124403
Tiivistelmä
Adversarial attacks are a crucial threat to deep neural networks as they hamper their reliability in real-world, security-sensitive applications of artificial intelligence. The aim of this research is to investigate the transferability of adversarial inputs across several image classifier convolutional neural networks using three different datasets, CIFAR-10, MNIST, and ImageNet. The adversarial examples are all generated from the ResNet18 model, and utilized to fool three target models, which are VGG-16, DenseNet-121, and EfficientNetB0, under white-box attack methods FGSM, PGD, and C&W. The resultant adversarial examples generated on the source model were able to degrade the performance of the target models significantly indicating high transferability rate.
The transferability rates for CIFAR-10 dataset often exceeded 90% in addition to high attack success rates such as PGD reaching up to 94.81%. In the case of MNIST, the source model has shown striking robustness, however transferability results have displayed crucial vulnerability. For ImageNet, all the models were severely damaged, as nearly all the attacks achieved 99% success rates in misleading the target models. This can be rationalized on the grounds that the pretrained models used in the experiment are also trained on the same dataset. These results accentuate the threat black-box attacks pose on state-of-the-art deep neural network models. The adversarial examples are model-agnostic, denoting inputs generated from one model can successfully trick other models to misclassify regardless of their architectural differences. The highest attack strength and transferability of ImageNet-based attacks imply a correlation between the complexity of the dataset and the generation of adversarial examples. This thesis highlights the crucial necessity of defense strategies that are generalized across different architectural features.
The transferability rates for CIFAR-10 dataset often exceeded 90% in addition to high attack success rates such as PGD reaching up to 94.81%. In the case of MNIST, the source model has shown striking robustness, however transferability results have displayed crucial vulnerability. For ImageNet, all the models were severely damaged, as nearly all the attacks achieved 99% success rates in misleading the target models. This can be rationalized on the grounds that the pretrained models used in the experiment are also trained on the same dataset. These results accentuate the threat black-box attacks pose on state-of-the-art deep neural network models. The adversarial examples are model-agnostic, denoting inputs generated from one model can successfully trick other models to misclassify regardless of their architectural differences. The highest attack strength and transferability of ImageNet-based attacks imply a correlation between the complexity of the dataset and the generation of adversarial examples. This thesis highlights the crucial necessity of defense strategies that are generalized across different architectural features.
Kokoelmat
- Avoin saatavuus [38829]