View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        SonarQube rule violations that actually lead to bugs

        Thumbnail
        View/Open
        Thesis_Tijmen_van_den_Pol.pdf (1.141Mb)
        Publication date
        2021
        Author
        Pol, Tijmen van den
        Metadata
        Show full item record
        Summary
        SonarQube is a popular free tool for automatically determining bugs and code smells in code. It is for developers interesting to know which rules that SonarQube uses are important to solve and which are less important. We conduct a research with a dataset that connects SonarQube rule violations to faults. This dataset is made by using a more recent version of SonarQube than was done before. Machine Learning (ML) is used to obtain feature importance that shows which rules are really important. We take a look at these importances by taking into account that the dataset is imbalanced. We use oversampling and ML methods that are good in dealing with imbalanced data. For determining the best rules by importances, we focus on rules that can predict faults well and we focus less on classifying the most entries in the dataset correctly, since the dataset is imbalanced. We do so by looking at G-mean and F-Beta importances. Furthermore, we look into the difference between permutation and drop-column importance, since the paper that this work is inspired on (Lenarduzzi, Lomio, Huttunen, et al., 2020) used drop-column and we will use permutation importance. We find that GradientBoost scores best at AUC, G-mean and F1-score. RandomForest scores best at F5-score. We observe that SonarQube 7 rules do a good job at predicting faults. However, we also find that a ML method variable importance is different from what we might find important for a SonarQube rule. When we look at the importance of types, we find that bugs are relevant for predicting faults, which is different from the paper that this research was base on Lenarduzzi, Lomio, Huttunen, et al., 2020.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/284
        Collections
        • Theses
        Utrecht university logo