Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributorRens van de Schoot, Duco Veen
dc.contributor.advisorSchoot, Rens van de
dc.contributor.authorWillems, Sjard
dc.date.accessioned2024-07-26T00:02:07Z
dc.date.available2024-07-26T00:02:07Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/46956
dc.description.abstractActive learning enhances efficiency in systematic reviews by optimizing the work saved over random sampling (WSS) and identifying relevant papers. This study investigates the impact of various preprocessing techniques on the performance of active learning models. Specifically, it evaluates the effectiveness of TF-IDF, SBERT, and Doc2Vec embeddings combined with different normalization and scaling methods, using Naive Bayes and logistic regression classifiers. The findings indicate that TF-IDF embeddings, particularly with L2 normalization and adding the absolute minimum value paired with Naive Bayes, performed the best, achieving high recall and low average time to find relevant documents. The highest WSS of SBERT combinations is achieved by combining z-score or Pareto normalization and absolute minimum scaling with logistic regression, showed 3% lower WSS and required computational resources. Doc2Vec, although less effective than SBERT, performed well with z-score or Pareto normalization and CDF scaling without needing a GPU. While TF-IDF remains a robust benchmark, SBERT and Doc2Vec offer promising alternatives for improving systematic reviews, warranting further exploration with additional configurations and fine-tuning. Further research should explore more combinations of feature extractors, classifiers, and normalization and scaling techniques.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectScaling and normalization of Word Embeddings: Evaluating the impact of ASReview performance
dc.titleScaling and normalization of Word Embeddings: Evaluating the impact of ASReview performance
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.courseuuApplied Data Science
dc.thesis.id34981


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record