Show simple item record

dc.rights.licenseCC-BY-NC-ND
dc.contributor.advisorÖnal Ertugrul, I.
dc.contributor.authorFiorentini, Giacomo
dc.date.accessioned2022-12-07T01:01:12Z
dc.date.available2022-12-07T01:01:12Z
dc.date.issued2022
dc.identifier.urihttps://studenttheses.uu.nl/handle/20.500.12932/43286
dc.description.abstractAutomatic detection of facial indicators of pain has many useful applications in the healthcare domain. Vision transformers are a top-performing architecture in computer vision, with little research on their use for pain assessment. In this thesis, we propose the first fully-attentive automated pain assessment pipeline that achieves state-of-the-art performance on direct and indirect pain detection from facial expressions. The models are trained on the UNBC-McMaster dataset, after faces are 3D-registered and rotated to the canonical frontal view. In our direct pain detection experiments we identify important areas of the hyperparameter space and their interaction with vision and video vision transformers, obtaining three noteworthy models. We also test these models on indirect pain detection and direct and indirect pain intensity estimation. Our indirect pain detection models underperform compared to their direct counterparts, but still outperform previous works while providing explanations for their predictions. We analyze the attention maps of one of our direct pain detection models, finding reasonable interpretations for its predictions. We find the models to perform much worse on pain intensity estimation, showing the limits of the simple approach chosen. We also evaluate Mixup, an augmentation technique, and Sharpness-Aware Minimization, an optimizer, with no success. Our presented models for direct pain detection, ViT-1-D (F1 score 0.55 ± 0.15), ViViT-1-D (F1 score 0.55 ± 0.13), and ViViT-2-D (F1 score 0.49 ± 0.04), all outperform earlier works, showing the potential of vision transformers for pain detection.
dc.description.sponsorshipUtrecht University
dc.language.isoEN
dc.subjectAutomatic detection of facial indicators of pain has many useful applications in the healthcare domain. Vision transformers are a top-performing architecture in computer vision, with little research on their use for pain assessment. In this thesis, we propose the first fully-attentive automated pain assessment pipeline that achieves state-of-the-art performance on direct and indirect pain detection from facial expressions.
dc.titleInterpretable and explainable vision and video vision transformers for pain detection
dc.type.contentMaster Thesis
dc.rights.accessrightsOpen Access
dc.subject.keywordsTransformer; Explainable; Interpretable; Pain; Facial expression; Computer Vision; Medicine
dc.subject.courseuuArtificial Intelligence
dc.thesis.id12458


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record