View Item 
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        •   Utrecht University Student Theses Repository Home
        • UU Theses Repository
        • Theses
        • View Item
        JavaScript is disabled for your browser. Some features of this site may not work without it.

        Browse

        All of UU Student Theses RepositoryBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

        Evaluating the Effectiveness, Generalizability, and Explainability of Video Swin Transformers on Automated Pain Detection

        Thumbnail
        View/Open
        Master_Thesis_Maximilian_Rau-Final.pdf (8.943Mb)
        Publication date
        2024
        Author
        Rau, Maximilian
        Metadata
        Show full item record
        Summary
        Recent advancements in computer vision, particularly with transformer-based models, offer promising potential for establishing new benchmarks in automated pain assessment through facial expressions. This thesis explores the efficacy of the Video Swin Transformer (VST), a recent approach that leverages temporal dynamics and offers a potential for nuanced detection capabilities of pain through varying scales. Our study involves applying the VST and comparing its performance against other transformer-based state-of-the-art models such as the Swin Transformer and the Vision Transformer (ViT). Through ablation studies, we demonstrated the positive impact of incorporating a higher temporal depth length into the model. Additionally, we evaluated the use of Focal loss to mitigate the issue of an imbalanced class distribution found in the UNBC McMaster dataset, which turned out to be insufficient. Furthermore, our research also focused on the generalizability of our models across different datasets, highlighting the need for more diverse datasets in training phases. Through the extraction of attention maps, we gained insights into the explainability, particularly the focus points of our models, confirming their utilization of pain-related regions for decision-making. The results were promising: our best models, VST-0 and VST-1-TD, set new benchmarks with F1-scores of 0.56±0.06 and 0.59±0.04, respectively, and achieved comparable state-of-the-art AUC scores of 0.85±0.04 and 0.87±0.03. This thesis underscores the potential of the VST architecture not only in automated pain assessment but also its broader applicability in the analysis of facial expressions.
        URI
        https://studenttheses.uu.nl/handle/20.500.12932/46527
        Collections
        • Theses
        Utrecht university logo