View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Out with the Old and in with the New? - A Comparison of Classical vs. State-of-the-Art Feature Extractors in the Context of Systematic Reviews

        Thumbnail
        View/Open
        Ana_Caklovic_Thesis_Osiris.pdf (1.300Mb)
        Publication date
        2022
        Author
        Caklovic, Ana
        Metadata
        Show full item record
        Summary
        Feature extraction is the process of transforming the raw data into features that the model will be trained on while trying to preserve as much information as possible. Choosing the proper feature extractor can greatly affect the performance of a classifier. Feature extraction has evolved from the older techniques such as tf-idf and Doc2Vec to transformers that have already been pre- trained on large corpora. However, although the newer techniques seem promising, it is not always clear when and why one feature extractor may outperform another. The aim of this study is to examine if state-of-the-art feature extractors (i.e., transformers like RoBERTa, MPNET, and SPECTER) can outperform classical feature extractors (i.e., tf-idf and Doc2Vec) when classifying systematic reviews as relevant or irrelevant. The study involved running multiple simulations with the ASReview software to see how well the different feature extractors (in combination with various classifiers) classified research articles as relevant or irrelevant. The results indicated that a tf-idf feature extractor, in combination with a Naive Bayes classifier, outperformed all other combinations, including the sentence transformer feature extractors.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/42409
        Collections
        • Theses
        Utrecht university logo