Towards Performance Comparability: An Implementation of New Metrics into the ASReview Active Learning Screening Prioritization Software for Systematic Literature Reviews
Summary
Simulation-based Active Learning (AL) studies have demonstrated the potential of machine
learning methods in reducing manual screening workload in systematic literature reviews. The
second most used performance metric in this field is Work Saved Over Sampling (WSS), which
aims to measure the reduction in screening effort. A drawback of the WSS metric, however, is its
sensitivity to dataset class imbalance, which leads to biased performance comparisons across
datasets. In this light, two main features were added to the state-of-the-art and open-source
simulation software ASReview, which offers a unique infrastructure for testing different AL model
and feature extractor combinations across datasets. First, the confusion matrix was implemented
into the ASReview software, which was subsequently used to implement the True Negative Rate
(TNR), shown to be equal to the normalized WSS (Kusa et al., 2023). These advancements,
previously absent in the software, represent a step towards achieving a more comprehensive
understanding of AL performance in SLR tasks. Specifically, the adjustment for class imbalance
facilitates further study of data characteristics related to model performance beyond class
imbalance. This enhanced understanding enables researchers and practitioners to make more
informed decisions in selecting and fine tuning AL models, ultimately leading to more efficient
screening in practice.