Active learning in recommender systems for predicting vulnerabilities in software

Stijger, Elise

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Krempl, G.M.
dc.contributor.author	Stijger, Elise
dc.date.accessioned	2024-01-06T00:01:00Z
dc.date.available	2024-01-06T00:01:00Z
dc.date.issued	2024
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/45783
dc.description.abstract	Due to a rapid advancement of digital technology and growing reliance on the internet, cybersecurity has become a paramount issue for individuals, organizations, and governments. To address this challenge, penetration testing has emerged as a critical tool to ensure the security of computer systems and networks. The reconnaissance phase of penetration testing plays a crucial role in identifying vulnerabilities in a system by gathering relevant information. Although various tools are available to automate this process, most of them are limited to identifying reported vulnerabilities, and they do not provide suggestions or predictions about vulnerabilities. Therefore, this research aims to investigate the application of recommender systems to predict common vulnerabilities during the reconnaissance phase. The main objective of this research is to investigate how active learning affects the performance of a recommender system to identify vulnerabilities in software products. Item-Based k-NN Collaborative Filtering, a recommender system, can improve the identification of potential vulnerabilities and the effectiveness of penetration testing by analyzing information from similar data points. This research involves a comprehensive data preprocessing phase, which utilizes data from the National Vulnerability Database (NVD). Several recommender systems are built using this data, which enables the prediction of potential vulnerabilities during the reconnaissance phase of penetration testing. The performances of these recommender systems are evaluated, and the topperforming recommender system implements active learning to enhance its performance. The findings of this research demonstrate that Item-Based k-NN Collaborative Filtering outperforms other recommender systems in terms of overall performance when it comes to identifying software vulnerabilities. Furthermore, when compared to Item-Based k-NN Collaborative Filtering prior to active learning or with active learning and a random sampling technique, Item-Based k-NN Collaborative Filtering with active learning incorporating a 4- or 10-batch sampling technique with 20 or 40 items added yields a statistically significant improvement in the precision score. This indicates that a greater proportion of the predicted vulnerabilities are correct. Item-Based k-NN Collaborative Filtering with active learning and a single-batch sampling strategy only results in a statistically significant improvement in precision, compared to Item-Based k-NN Collaborative Filtering prior active learning or with active learning and a random sampling technique, when 20 items are added instead of 40. Furthermore, only Item-Based k-NN Collaborative Filtering with a 10-batch sampling strategy adding 20 items demonstrated a statistically significant improvement in nDCG scores compared to Item-Based k-NN Collaborative Filtering prior to active learning. This implies a more accurate ranking of the vulnerabilities. However, this could potentially be a type I error. From these findings, it can be concluded that introducing active learning in Item-Based k-NN Collaborative Filtering, using the approaches outlined, leads to significant improvement in precision score but not necessarily in nDCG score. Considering this conclusion, it is advised to use Item-Based k-NN Collaborative Filtering with active learning to predict vulnerabilities in software products and enhance the reconnaissance phase of penetration testing. This can be achieved by incorporating a single-batch sampling technique with 20 items added or a 4- or 10-batch sampling technique with 20 or 40 added. The insights gained from this research can help individuals, organizations, and governments strengthen their cybersecurity defences and protect against potential cyber threats.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	Investigate how active learning affects the performance of a recommender system to identify vulnerabilities in software products.
dc.title	Active learning in recommender systems for predicting vulnerabilities in software
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	active learning, recommender system, collaborative filtering
dc.subject.courseuu	Artificial Intelligence
dc.thesis.id	26883

Files in this item

Name:: Thesis_Elise_Stijger_Active ...
Size:: 5.409Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record