Language morphology in active learning aided systematic reviews
Kroft, Mathijs van der
MetadataShow full item record
Active learning aided abstract screening can alleviate the labour-intensive process of systematic reviewing. In such a learning cycle, a machine learning model suggests the next abstract to be reviewed, and a researcher classifies the abstract as relevant or irrelevant. A systematic review should include all relevant studies, regardless of the language it is conducted in. Machine translation of abstracts helps here, but it is unknown how classification performance changes when abstracts are translated. This study simulates the active learning process with English datasets, and with the same datasets that were machine-translated to German, Spanish and Turkish. A key step in the active learning pipeline is the generation of a vector representation of the text, using a feature extractor. The feature extraction methods tf-idf, Doc2Vec, FastText and SBERT were compared on their classification performance for all languages. The results show that no consistent disadvantage to translation can be found for the selected datasets, except for FastText.