View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Comparative Analysis of Unsupervised Learning Techniques for Topic Extraction in Bank Complaints

        Thumbnail
        View/Open
        ADS_master_thesis_kitharidis.pdf (1.481Mb)
        Publication date
        2023
        Author
        Kitharidis, Sofoklis
        Metadata
        Show full item record
        Summary
        This thesis investigates the application of unsupervised learning algo- rithms, namely KMeans, Latent Dirichlet Allocation (LDA), BERTopic, and Hierarchical clustering to analyze customer complaint data in the banking sector. The research aims to uncover patterns, topics, and insights from the complaints to enhance customer satisfaction strategies. The problem statement revolves around understanding the impact of dif- ferent natural language processing methods on the comprehension of fi- nancial complaint data and their comparative performance. The key re- search question addresses how various NLP methods influence the under- standing of financial complaint data and how these methods can be compared. To address this question, the study utilizes four unsupervised learning algorithms: KMeans, LDA, BERTopic, and Hierarchical clustering. KMeans is employed with Word2Vec, Doc2vec, TF-IDF and BERT embeddings, while LDA is applied using Bag of Words, TF-IDF, and Word2Vec repre- sentations. BERTopic with DBSCAN and hierarchical clustering algorithm is also explored with Word2Vec, Doc2vec, TF-IDF and BERT embeddings. The analysis reveals significant findings, including the identification of key topics in the customer complaints dataset and the comparison of different clustering approaches. The results demonstrate that KMeans with Word2Vec embeddings achieves the highest cluster separation and density, indicating its superior performance. LDA highlights relevant topics related to loans, payments, communication, debt, and banking services. BERTopic with DBSCAN demonstrates improved cluster separation and provides precise and distinctive topics. In summary, this research provides valuable insights into the understand- ing of financial complaint data using unsupervised learning algorithms. The findings contribute to the development of customer satisfaction im- provement strategies in the banking industry. Last but not least, the study addresses ethical considerations, such as privacy and data integrity, ensuring responsible research practices throughout the analysis.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/44626
        Collections
        • Theses
        Utrecht university logo