Uncovering Smart Contract Vulnerabilities: A Systematic Literature Review and a Deep Learning Approach to Predict Known and Unknown Threats.
Summary
In recent years, blockchain technology has gained massive popularity. As a result, smart contracts,
self-executing contracts with the terms of the agreement directly written into code, a crucial
component of it, have received a growing amount of interest. While it offers a variety of advantages,
it also appears to be exposed to bugs and vulnerabilities, which can affect security breaches and
financial losses, and because of that the concerns around smart contracts have grown in prominence.
The majority of existing vulnerability detection tools are based on expert-defined criteria, that
are inefficient and non-scalable. But the most important is that the expert-defined criteria are
prone to inaccuracy and are vulnerable to deception by attackers. Except for traditional detection
tools that have been created, experts have implemented machine learning methods to improve the
security of smart contracts. Even though research studies employing machine learning methods for
vulnerability detection have been conducted, a significant progress has been made but there are
still many challenges and areas for development.
The objectives of the thesis are to comprehensively investigate machine learning-based ap-
proaches for smart-contract vulnerability detection. Firstly, we conduct a systematic literature
review on machine learning approaches for automated vulnerability detection finding a total of 183
papers. After applying inclusion and exclusion criteria, snowballing process, quality assessment,
data extraction, and synthesizing, 86 relevant papers were selected and analyzed. Categories,
algorithms trends, databases, combinations and evaluation metrics were found and analyzed to
showcase the work that has been conducted.
The results of the study reveal that many highly accurate tools and methods are available
for Smart Contract vulnerability detection. However, we also identified several limitations and
potential areas for future research. The major drawbacks include the lack of capability of identifying
completely unknown vulnerabilities, problems of scalability, the quality of the underlying dataset,
limited identification range, and problems with feature extraction from bytecode and opcodes.
Based on these gaps, the thesis presents diverse solutions, which include a CNN-BiLSTM with
Attention model that can detect the known and unknown vulnerabilities. This added capability
highlights how ML models can come up with a new pattern or vulnerability which other static
analysis tools such as Slither may miss.
Our model uses the multiple feature vectors and dimensional representations to improve the
detection process combined from opcode frequency vector, Word2Vec vector, and n-gram vectors.
The combined representation encompasses the frequency distribution, semantic association, context
from the different opcode sequences to identify vulnerabilities reliably. Moreover, the study assesses
the performance of two kinds of attention mechanisms and single and multi-attention mechanisms
in multi-label datasets. Experimental outcomes confirm that the proposed model, CNN-BiLSTM
with Multihead Attention is the best configuration that was achieved with an accuracy of 87.08%,
and F1 score as 74.66% in the known multi-label vulnerability experiment, and the same model with
88% accuracy, 91% F1 score in the unknown vulnerability experiment. These results help to resolve
issues of scale, feature selection and generalization and provide a basis for further fundamental
studies in the field of improving the models for the detection of vulnerabilities.
In addition, although the present research concentrates on smart contracts, the approaches and
conclusions can potentially be applied to all types of source code, thereby providing a basis for
future work in software vulnerability identification.