Comparative Analysis of Unsupervised Learning Techniques for Topic Extraction in Bank Complaints

Kitharidis, Sofoklis

dc.rights.license	CC-BY-NC-ND
dc.contributor	Sofoklis Kitharidis
dc.contributor.advisor	Bosch, Antal van den
dc.contributor.author	Kitharidis, Sofoklis
dc.date.accessioned	2023-08-11T00:02:09Z
dc.date.available	2023-08-11T00:02:09Z
dc.date.issued	2023
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/44626
dc.description.abstract	This thesis investigates the application of unsupervised learning algo- rithms, namely KMeans, Latent Dirichlet Allocation (LDA), BERTopic, and Hierarchical clustering to analyze customer complaint data in the banking sector. The research aims to uncover patterns, topics, and insights from the complaints to enhance customer satisfaction strategies. The problem statement revolves around understanding the impact of dif- ferent natural language processing methods on the comprehension of fi- nancial complaint data and their comparative performance. The key re- search question addresses how various NLP methods influence the under- standing of financial complaint data and how these methods can be compared. To address this question, the study utilizes four unsupervised learning algorithms: KMeans, LDA, BERTopic, and Hierarchical clustering. KMeans is employed with Word2Vec, Doc2vec, TF-IDF and BERT embeddings, while LDA is applied using Bag of Words, TF-IDF, and Word2Vec repre- sentations. BERTopic with DBSCAN and hierarchical clustering algorithm is also explored with Word2Vec, Doc2vec, TF-IDF and BERT embeddings. The analysis reveals significant findings, including the identification of key topics in the customer complaints dataset and the comparison of different clustering approaches. The results demonstrate that KMeans with Word2Vec embeddings achieves the highest cluster separation and density, indicating its superior performance. LDA highlights relevant topics related to loans, payments, communication, debt, and banking services. BERTopic with DBSCAN demonstrates improved cluster separation and provides precise and distinctive topics. In summary, this research provides valuable insights into the understand- ing of financial complaint data using unsupervised learning algorithms. The findings contribute to the development of customer satisfaction im- provement strategies in the banking industry. Last but not least, the study addresses ethical considerations, such as privacy and data integrity, ensuring responsible research practices throughout the analysis.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	Comparative Analysis of Unsupervised Learning Techniques for Topic Extraction in Bank Complaints
dc.title	Comparative Analysis of Unsupervised Learning Techniques for Topic Extraction in Bank Complaints
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Unsupervised Learning Algorithms; KMeans ;Latent Dirichlet Allocation (LDA); BERTopic; Hierarchical Clustering;Customer Complaint Data;Banking Sector;Patterns; Topics; Insights;Customer Satisfaction Strategies; Natural Language Processing (NLP); Financial Complaint Data; Comparative Performance; Research Question; Word2Vec;Doc2Vec; TF-IDF;BERT Embeddings;Bag of Words;DBSCAN;Cluster Separation; Cluster Density; Ethical Considerations;
dc.subject.courseuu	Applied Data Science
dc.thesis.id	21627

Files in this item

Name:: ADS_master_thesis_kitharidis.pdf
Size:: 1.481Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record