View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        xSPAM: Introducing and Evaluating Explainability for a Supervised Topic Model

        Thumbnail
        View/Open
        Final_version_xSPAM_Marijn_van_Wietmarschen_28_11_2025.pdf (765.9Kb)
        Publication date
        2026
        Author
        Wietmarschen, marijn van
        Metadata
        Show full item record
        Summary
        This thesis investigates how humanitarian organisations can better analyse large volumes of short, unstructured feedback by using explainable and adaptive topic modelling. In collaboration with the Netherlands Red Cross, the research addresses the limitations of manual feedback labelling, which is slow, inconsistent, and difficult to scale during fast-moving crises. Existing topic modelling approaches often struggle with short texts, lack transparency, and rarely adapt to changes in language or needs over time. To tackle these challenges, the thesis introduces xSPAM, the Explainable Supervised Pachinko Allocation Model. xSPAM is a supervised, hierarchical topic model that extends the Pachinko Allocation Model by integrating document labels directly into the topic learning process. This allows the model to learn topic structures that are not only descriptive but also predictive of predefined categories. In addition, xSPAM is designed with explainability in mind, enabling the reconstruction of how individual words contribute to a predicted label, and includes a post-processing module to detect word and topic drift over time. The model is trained using supervised Gibbs sampling, where topic assignments are influenced both by the likelihood of the text and by how well they support correct label prediction. For unseen documents, xSPAM combines similarity-based initialisation, local refinement, and a trained classifier to infer labels efficiently. Explainability is achieved by marginalising over the learned topic hierarchy to estimate how strongly specific words are associated with particular labels, while word trend evolution is analysed using frequency-based measures of popularity and momentum across time slices. The system is evaluated through quantitative experiments and a qualitative user study with Red Cross analysts. Quantitatively, xSPAM achieves classification accuracy comparable to state-of-the-art baselines such as BERTopic and supervised LDA on real humanitarian datasets, but this predictive focus increases model complexity and reduces topic coherence. Qualitatively, the user study reveals a clear trade-off in explainability methods. Visualisations showing model confidence and prediction certainty are perceived as clear, useful, and trustworthy, whereas fine-grained word-level attributions are often met with scepticism. The word trend analysis component is seen as confusing and insufficiently connected to analysts’ core tasks. Overall, the thesis concludes that in the context of short, noisy humanitarian feedback, high-level transparency—particularly through clear visualisations of model confidence—is more valuable than detailed word-based explanations. While xSPAM demonstrates that supervised, explainable topic modelling is feasible and effective, the findings highlight the importance of balancing predictive performance, interpretability, and usability when deploying AI systems in sensitive, real-world humanitarian settings.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/50904
        Collections
        • Theses
        Utrecht university logo