Uncovering Smart Contract Vulnerabilities: A Systematic Literature Review and a Deep Learning Approach to Predict Known and Unknown Threats.

Stasinos, Stylianos

View/Open

Enhancing_Smart_Contract_Security__Systematic_Literature_Review_and_CNN_BiLSTM_ATT_Approach_final.pdf (3.517Mb)

Publication date

2025

Author

Stasinos, Stylianos

Metadata

Show full item record

Summary

In recent years, blockchain technology has gained massive popularity. As a result, smart contracts, self-executing contracts with the terms of the agreement directly written into code, a crucial component of it, have received a growing amount of interest. While it offers a variety of advantages, it also appears to be exposed to bugs and vulnerabilities, which can affect security breaches and financial losses, and because of that the concerns around smart contracts have grown in prominence. The majority of existing vulnerability detection tools are based on expert-defined criteria, that are inefficient and non-scalable. But the most important is that the expert-defined criteria are prone to inaccuracy and are vulnerable to deception by attackers. Except for traditional detection tools that have been created, experts have implemented machine learning methods to improve the security of smart contracts. Even though research studies employing machine learning methods for vulnerability detection have been conducted, a significant progress has been made but there are still many challenges and areas for development. The objectives of the thesis are to comprehensively investigate machine learning-based ap- proaches for smart-contract vulnerability detection. Firstly, we conduct a systematic literature review on machine learning approaches for automated vulnerability detection finding a total of 183 papers. After applying inclusion and exclusion criteria, snowballing process, quality assessment, data extraction, and synthesizing, 86 relevant papers were selected and analyzed. Categories, algorithms trends, databases, combinations and evaluation metrics were found and analyzed to showcase the work that has been conducted. The results of the study reveal that many highly accurate tools and methods are available for Smart Contract vulnerability detection. However, we also identified several limitations and potential areas for future research. The major drawbacks include the lack of capability of identifying completely unknown vulnerabilities, problems of scalability, the quality of the underlying dataset, limited identification range, and problems with feature extraction from bytecode and opcodes. Based on these gaps, the thesis presents diverse solutions, which include a CNN-BiLSTM with Attention model that can detect the known and unknown vulnerabilities. This added capability highlights how ML models can come up with a new pattern or vulnerability which other static analysis tools such as Slither may miss. Our model uses the multiple feature vectors and dimensional representations to improve the detection process combined from opcode frequency vector, Word2Vec vector, and n-gram vectors. The combined representation encompasses the frequency distribution, semantic association, context from the different opcode sequences to identify vulnerabilities reliably. Moreover, the study assesses the performance of two kinds of attention mechanisms and single and multi-attention mechanisms in multi-label datasets. Experimental outcomes confirm that the proposed model, CNN-BiLSTM with Multihead Attention is the best configuration that was achieved with an accuracy of 87.08%, and F1 score as 74.66% in the known multi-label vulnerability experiment, and the same model with 88% accuracy, 91% F1 score in the unknown vulnerability experiment. These results help to resolve issues of scale, feature selection and generalization and provide a basis for further fundamental studies in the field of improving the models for the detection of vulnerabilities. In addition, although the present research concentrates on smart contracts, the approaches and conclusions can potentially be applied to all types of source code, thereby providing a basis for future work in software vulnerability identification.

URI

https://studenttheses.uu.nl/handle/20.500.12932/48468

Collections

Theses