Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorÖnal Ertugrul, I.
dc.contributor.authorRau, Maximilian
dc.date.accessioned2024-06-19T23:01:54Z
dc.date.available2024-06-19T23:01:54Z
dc.date.issued2024
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/46527
dc.description.abstractRecent advancements in computer vision, particularly with transformer-based models, offer promising potential for establishing new benchmarks in automated pain assessment through facial expressions. This thesis explores the efficacy of the Video Swin Transformer (VST), a recent approach that leverages temporal dynamics and offers a potential for nuanced detection capabilities of pain through varying scales. Our study involves applying the VST and comparing its performance against other transformer-based state-of-the-art models such as the Swin Transformer and the Vision Transformer (ViT). Through ablation studies, we demonstrated the positive impact of incorporating a higher temporal depth length into the model. Additionally, we evaluated the use of Focal loss to mitigate the issue of an imbalanced class distribution found in the UNBC McMaster dataset, which turned out to be insufficient. Furthermore, our research also focused on the generalizability of our models across different datasets, highlighting the need for more diverse datasets in training phases. Through the extraction of attention maps, we gained insights into the explainability, particularly the focus points of our models, confirming their utilization of pain-related regions for decision-making. The results were promising: our best models, VST-0 and VST-1-TD, set new benchmarks with F1-scores of 0.56±0.06 and 0.59±0.04, respectively, and achieved comparable state-of-the-art AUC scores of 0.85±0.04 and 0.87±0.03. This thesis underscores the potential of the VST architecture not only in automated pain assessment but also its broader applicability in the analysis of facial expressions.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectInvestigations of the Video Swin Transformer on automated pain detection. Next to the performance evaluation and comparison with other state-of-the-art models, temporal dynamics, the application of Focal loss, generalizability, and explainability were explored.
dc.titleEvaluating the Effectiveness, Generalizability, and Explainability of Video Swin Transformers on Automated Pain Detection
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsAutomated Pain Detection, Transformer, Video Swin Transformer, Generalizability, Explainable AI, Focal Loss
dc.subject.courseuuArtificial Intelligence
dc.thesis.id31601


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record