Evaluating the Effectiveness of the Topic Models LDA and BTM for Uncovering Topics in Open-Ended Employee Engagement Survey Responses
Summary
One of the key challenges in analyzing open-ended answers is the labor-intensive nature, which typically requires significant effort. To address this challenge, this research investigated the potential of NLP techniques, specifically topic modeling, to automate the discovery of topics in unstructured text answers. The study explored two different topic modeling methods, namely LDA and BTM, to assess their effectiveness in uncovering latent topics. By employing these methods, the research aimed to automate the extraction of meaningful themes from the open-ended responses. The data underwent preprocessing, and the models were fine-tuned with optimized parameters. The LDA model failed to provide meaningful insights into the underlying topics. However, the results obtained from the BTM model proved to be highly valuable in extracting latent topics from unstructured and unlabeled text data. The BTM model, which employs biterms to address sparse word co-occurrence in short texts, successfully generated topics with interpretable sets of top words. With some manual adjustments and labeling of these topics, the outcomes can be effectively applied in the analysis of open-ended responses, for example when combined with topic classification and sentiment analysis techniques. This research contributes to the field of topic modeling by highlighting the effectiveness of the BTM model in analyzing unstructured text data from employee engagement surveys. The BTM model is a promising model to uncover topics in data with different text lengths and structures.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Supporting Neuroscience Literature Exploration by Identifying Intermediate Topics in Indirect Relations using Augmented Reality
Vogiatzopoulos, dimitrios (2024) -
Embargo on Russian music in Lithuania during the Russo-Ukrainian war of 2022
Ališauskaitė, Aiste (2022)The Russo-Ukrainian war of 2022 has spurred a widespread discussion about re-evaluating Russian culture in contemporary Lithuanian media. Russian music, which has been present in Lithuania since the Soviet occupation, is ... -
Interpretable Text Classification through Topic Modeling by Clustering in Word Embedding Spaces
Scholten, Niels (2024)Topic modeling is a method for generating prevalent themes in large collections of natural language documents. Recently, representations of documents as a distribution of topics have been used as features for text ...