Developing text-mining methods to review the published literature
Summary
Text mining is considered an effective approach for the identification of relevant phenomena in systematic reviews. Topic models have shown to be a promising unsupervised technique to reveal common topics in text data. This research used three topic modeling text mining algorithms, LDA, Top2Vec, and BERTopic, to identify the relevant phenomena in two datasets from published literature text data. The first dataset contains bibliographic data of articles about adolescents’ emotional regulation, and the second, bibliographic data of articles about cooperation in prisoner’s dilemma, where each of the datasets is divided to abstracts and keywords. The goal of this thesis is to select the optimal number of topics/phenomena and then map them to a network. Comparing the performance of the three algorithms with regards to topic quality and network representation of the topics, it is concluded that BERTopic produced more meaningful topics than Top2Vec and LDA.