View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Improving neural network trojan detection via network abstraction

        Thumbnail
        View/Open
        Thesis_Eiermann_Neural_Trojan_Detection.pdf (6.559Mb)
        Publication date
        2024
        Author
        Eiermann, Marcello
        Metadata
        Show full item record
        Summary
        Deep learning-based image recognition systems have become essential in a variety of applications, including autonomous driving functions in vehicles. The increased use of third-party datasets and pretrained models open up a new security risk, where any potential user cannot know if the data or model have been manipulated. Attackers can plant a backdoor during the training phase by poisoning a part of the dataset with a trojan trigger. The trojaned model behaves normally on benign inputs, but inputs that contain the trigger will cause the model to intentionally select a wrong output. In the domain of autonomous vehicles, an attack which causes an intentional misclassification of a road sign could have fatal consequences. One of the methods for detecting neural network trojans is Artificial Brain Stimulation (ABS), which manually stimulates a neuron’s activation value and observes the change in output activation values. We combine ABS with the neural network abstraction tool DeepAbstract, which computes clusters and cluster representatives, based on Input/Output similarity of neurons. Our strategy involves selectively applying the ABS analysis on the subset of cluster representatives, to possibly reduce the computational load and increase the detection accuracy. To assess the efficacy of our method, we conducted experiments using the GTSRB dataset, trojaning multiple models with six distinct triggers of varying visibility. We analyze two research questions: Whether our method can lead to a runtime improvement compared to ABS, and whether it can increase the detection accuracy. One model showed an improvement in stimulation runtime, while the runtime of the other models remained equal. Our method consistently yields superior or equivalent detection accuracy across all tested models compared to ABS. At best, our method increased the reverse-engineered attack success rate score by 33% and the number of detected trojaned neurons by 59%, demonstrating a clear improvement in detection accuracy
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/46064
        Collections
        • Theses
        Utrecht university logo