Comparative Analysis of Unsupervised Learning Techniques for Topic Extraction in Bank Complaints
Summary
This thesis investigates the application of unsupervised learning algo- rithms, namely KMeans, Latent Dirichlet Allocation (LDA), BERTopic, and Hierarchical clustering to analyze customer complaint data in the banking sector. The research aims to uncover patterns, topics, and insights from the complaints to enhance customer satisfaction strategies.
The problem statement revolves around understanding the impact of dif- ferent natural language processing methods on the comprehension of fi- nancial complaint data and their comparative performance. The key re- search question addresses how various NLP methods influence the under- standing of financial complaint data and how these methods can be compared.
To address this question, the study utilizes four unsupervised learning algorithms: KMeans, LDA, BERTopic, and Hierarchical clustering. KMeans is employed with Word2Vec, Doc2vec, TF-IDF and BERT embeddings, while LDA is applied using Bag of Words, TF-IDF, and Word2Vec repre- sentations. BERTopic with DBSCAN and hierarchical clustering algorithm is also explored with Word2Vec, Doc2vec, TF-IDF and BERT embeddings.
The analysis reveals significant findings, including the identification of key topics in the customer complaints dataset and the comparison of different clustering approaches. The results demonstrate that KMeans with Word2Vec embeddings achieves the highest cluster separation and density, indicating its superior performance. LDA highlights relevant topics related to loans, payments, communication, debt, and banking services. BERTopic with DBSCAN demonstrates improved cluster separation and provides precise and distinctive topics.
In summary, this research provides valuable insights into the understand- ing of financial complaint data using unsupervised learning algorithms. The findings contribute to the development of customer satisfaction im- provement strategies in the banking industry. Last but not least, the study addresses ethical considerations, such as privacy and data integrity, ensuring responsible research practices throughout the analysis.