Language morphology in active learning aided systematic reviews

Kroft, Mathijs van der

View/Open

Significance of language morphology in active learning aided systematic reviews.pdf (2.537Mb)

Publication date

2022

Author

Kroft, Mathijs van der

Metadata

Show full item record

Summary

Active learning aided abstract screening can alleviate the labour-intensive process of systematic reviewing. In such a learning cycle, a machine learning model suggests the next abstract to be reviewed, and a researcher classifies the abstract as relevant or irrelevant. A systematic review should include all relevant studies, regardless of the language it is conducted in. Machine translation of abstracts helps here, but it is unknown how classification performance changes when abstracts are translated. This study simulates the active learning process with English datasets, and with the same datasets that were machine-translated to German, Spanish and Turkish. A key step in the active learning pipeline is the generation of a vector representation of the text, using a feature extractor. The feature extraction methods tf-idf, Doc2Vec, FastText and SBERT were compared on their classification performance for all languages. The results show that no consistent disadvantage to translation can be found for the selected datasets, except for FastText.

URI

https://studenttheses.uu.nl/handle/20.500.12932/42714

Collections

Theses