Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorKlein, Dominik
dc.contributor.authorEiermann, Marcello
dc.date.accessioned2024-02-27T00:01:11Z
dc.date.available2024-02-27T00:01:11Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/46064
dc.description.abstractDeep learning-based image recognition systems have become essential in a variety of applications, including autonomous driving functions in vehicles. The increased use of third-party datasets and pretrained models open up a new security risk, where any potential user cannot know if the data or model have been manipulated. Attackers can plant a backdoor during the training phase by poisoning a part of the dataset with a trojan trigger. The trojaned model behaves normally on benign inputs, but inputs that contain the trigger will cause the model to intentionally select a wrong output. In the domain of autonomous vehicles, an attack which causes an intentional misclassification of a road sign could have fatal consequences. One of the methods for detecting neural network trojans is Artificial Brain Stimulation (ABS), which manually stimulates a neuron’s activation value and observes the change in output activation values. We combine ABS with the neural network abstraction tool DeepAbstract, which computes clusters and cluster representatives, based on Input/Output similarity of neurons. Our strategy involves selectively applying the ABS analysis on the subset of cluster representatives, to possibly reduce the computational load and increase the detection accuracy. To assess the efficacy of our method, we conducted experiments using the GTSRB dataset, trojaning multiple models with six distinct triggers of varying visibility. We analyze two research questions: Whether our method can lead to a runtime improvement compared to ABS, and whether it can increase the detection accuracy. One model showed an improvement in stimulation runtime, while the runtime of the other models remained equal. Our method consistently yields superior or equivalent detection accuracy across all tested models compared to ABS. At best, our method increased the reverse-engineered attack success rate score by 33% and the number of detected trojaned neurons by 59%, demonstrating a clear improvement in detection accuracy
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectImproving neural network trojan detection via network abstraction
dc.titleImproving neural network trojan detection via network abstraction
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsMachine learning; Adversarial ML; Explainable AI; Trojan attacks;
dc.subject.courseuuArtificial Intelligence
dc.thesis.id28500


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record