Uncovering Smart Contract Vulnerabilities: A Systematic Literature Review and a Deep Learning Approach to Predict Known and Unknown Threats.

Stasinos, Stylianos

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Jansen, Slinger
dc.contributor.author	Stasinos, Stylianos
dc.date.accessioned	2025-02-06T00:01:43Z
dc.date.available	2025-02-06T00:01:43Z
dc.date.issued	2025
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/48468
dc.description.abstract	In recent years, blockchain technology has gained massive popularity. As a result, smart contracts, self-executing contracts with the terms of the agreement directly written into code, a crucial component of it, have received a growing amount of interest. While it offers a variety of advantages, it also appears to be exposed to bugs and vulnerabilities, which can affect security breaches and financial losses, and because of that the concerns around smart contracts have grown in prominence. The majority of existing vulnerability detection tools are based on expert-defined criteria, that are inefficient and non-scalable. But the most important is that the expert-defined criteria are prone to inaccuracy and are vulnerable to deception by attackers. Except for traditional detection tools that have been created, experts have implemented machine learning methods to improve the security of smart contracts. Even though research studies employing machine learning methods for vulnerability detection have been conducted, a significant progress has been made but there are still many challenges and areas for development. The objectives of the thesis are to comprehensively investigate machine learning-based ap- proaches for smart-contract vulnerability detection. Firstly, we conduct a systematic literature review on machine learning approaches for automated vulnerability detection finding a total of 183 papers. After applying inclusion and exclusion criteria, snowballing process, quality assessment, data extraction, and synthesizing, 86 relevant papers were selected and analyzed. Categories, algorithms trends, databases, combinations and evaluation metrics were found and analyzed to showcase the work that has been conducted. The results of the study reveal that many highly accurate tools and methods are available for Smart Contract vulnerability detection. However, we also identified several limitations and potential areas for future research. The major drawbacks include the lack of capability of identifying completely unknown vulnerabilities, problems of scalability, the quality of the underlying dataset, limited identification range, and problems with feature extraction from bytecode and opcodes. Based on these gaps, the thesis presents diverse solutions, which include a CNN-BiLSTM with Attention model that can detect the known and unknown vulnerabilities. This added capability highlights how ML models can come up with a new pattern or vulnerability which other static analysis tools such as Slither may miss. Our model uses the multiple feature vectors and dimensional representations to improve the detection process combined from opcode frequency vector, Word2Vec vector, and n-gram vectors. The combined representation encompasses the frequency distribution, semantic association, context from the different opcode sequences to identify vulnerabilities reliably. Moreover, the study assesses the performance of two kinds of attention mechanisms and single and multi-attention mechanisms in multi-label datasets. Experimental outcomes confirm that the proposed model, CNN-BiLSTM with Multihead Attention is the best configuration that was achieved with an accuracy of 87.08%, and F1 score as 74.66% in the known multi-label vulnerability experiment, and the same model with 88% accuracy, 91% F1 score in the unknown vulnerability experiment. These results help to resolve issues of scale, feature selection and generalization and provide a basis for further fundamental studies in the field of improving the models for the detection of vulnerabilities. In addition, although the present research concentrates on smart contracts, the approaches and conclusions can potentially be applied to all types of source code, thereby providing a basis for future work in software vulnerability identification.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	The thesis explores machine learning approaches for detecting vulnerabilities in smart contracts, addressing limitations in existing tools like scalability, dataset quality, and unknown vulnerability identification. It introduces a CNN-BiLSTM with Multihead Attention model, leveraging opcode frequency, Word2Vec, and n-gram vectors for multi-label classification. Furthermore, inlcudes also a systematic literature review on machine learning methods for smart contract vulnerability detection.
dc.title	Uncovering Smart Contract Vulnerabilities: A Systematic Literature Review and a Deep Learning Approach to Predict Known and Unknown Threats.
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.courseuu	Artificial Intelligence
dc.thesis.id	42742

Files in this item

Name:: Enhancing_Smart_Contract_Secur ...
Size:: 3.517Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record