Scaling and normalization of Word Embeddings: Evaluating the impact of ASReview performance
Summary
Active learning enhances efficiency in systematic reviews by optimizing the work saved over
random sampling (WSS) and identifying relevant papers. This study investigates the impact
of various preprocessing techniques on the performance of active learning models. Specifically,
it evaluates the effectiveness of TF-IDF, SBERT, and Doc2Vec embeddings combined
with different normalization and scaling methods, using Naive Bayes and logistic regression
classifiers.
The findings indicate that TF-IDF embeddings, particularly with L2 normalization and adding
the absolute minimum value paired with Naive Bayes, performed the best, achieving high recall
and low average time to find relevant documents. The highest WSS of SBERT combinations
is achieved by combining z-score or Pareto normalization and absolute minimum scaling
with logistic regression, showed 3% lower WSS and required computational resources.
Doc2Vec, although less effective than SBERT, performed well with z-score or Pareto normalization
and CDF scaling without needing a GPU.
While TF-IDF remains a robust benchmark, SBERT and Doc2Vec offer promising alternatives
for improving systematic reviews, warranting further exploration with additional configurations
and fine-tuning. Further research should explore more combinations of feature extractors,
classifiers, and normalization and scaling techniques.