Towards Performance Comparability: An Implementation of New Metrics into the ASReview Active Learning Screening Prioritization Software for Systematic Literature Reviews
MetadataShow full item record
Simulation-based Active Learning (AL) studies have demonstrated the potential of machine learning methods in reducing manual screening workload in systematic literature reviews. The second most used performance metric in this field is Work Saved Over Sampling (WSS), which aims to measure the reduction in screening effort. A drawback of the WSS metric, however, is its sensitivity to dataset class imbalance, which leads to biased performance comparisons across datasets. In this light, two main features were added to the state-of-the-art and open-source simulation software ASReview, which offers a unique infrastructure for testing different AL model and feature extractor combinations across datasets. First, the confusion matrix was implemented into the ASReview software, which was subsequently used to implement the True Negative Rate (TNR), shown to be equal to the normalized WSS (Kusa et al., 2023). These advancements, previously absent in the software, represent a step towards achieving a more comprehensive understanding of AL performance in SLR tasks. Specifically, the adjustment for class imbalance facilitates further study of data characteristics related to model performance beyond class imbalance. This enhanced understanding enables researchers and practitioners to make more informed decisions in selecting and fine tuning AL models, ultimately leading to more efficient screening in practice.