SonarQube rule violations that actually lead to bugs
Summary
SonarQube is a popular free tool for automatically determining bugs and code smells
in code. It is for developers interesting to know which rules that SonarQube uses are
important to solve and which are less important.
We conduct a research with a dataset that connects SonarQube rule violations to
faults. This dataset is made by using a more recent version of SonarQube than was done
before. Machine Learning (ML) is used to obtain feature importance that shows which
rules are really important. We take a look at these importances by taking into account
that the dataset is imbalanced. We use oversampling and ML methods that are good in
dealing with imbalanced data.
For determining the best rules by importances, we focus on rules that can predict
faults well and we focus less on classifying the most entries in the dataset correctly, since
the dataset is imbalanced. We do so by looking at G-mean and F-Beta importances.
Furthermore, we look into the difference between permutation and drop-column
importance, since the paper that this work is inspired on (Lenarduzzi, Lomio, Huttunen,
et al., 2020) used drop-column and we will use permutation importance.
We find that GradientBoost scores best at AUC, G-mean and F1-score. RandomForest
scores best at F5-score. We observe that SonarQube 7 rules do a good job at predicting
faults. However, we also find that a ML method variable importance is different from
what we might find important for a SonarQube rule. When we look at the importance of
types, we find that bugs are relevant for predicting faults, which is different from the
paper that this research was base on Lenarduzzi, Lomio, Huttunen, et al., 2020.