Interpretable and explainable vision and video vision transformers for pain detection

Fiorentini, Giacomo

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Önal Ertugrul, I.
dc.contributor.author	Fiorentini, Giacomo
dc.date.accessioned	2022-12-07T01:01:12Z
dc.date.available	2022-12-07T01:01:12Z
dc.date.issued	2022
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/43286
dc.description.abstract	Automatic detection of facial indicators of pain has many useful applications in the healthcare domain. Vision transformers are a top-performing architecture in computer vision, with little research on their use for pain assessment. In this thesis, we propose the first fully-attentive automated pain assessment pipeline that achieves state-of-the-art performance on direct and indirect pain detection from facial expressions. The models are trained on the UNBC-McMaster dataset, after faces are 3D-registered and rotated to the canonical frontal view. In our direct pain detection experiments we identify important areas of the hyperparameter space and their interaction with vision and video vision transformers, obtaining three noteworthy models. We also test these models on indirect pain detection and direct and indirect pain intensity estimation. Our indirect pain detection models underperform compared to their direct counterparts, but still outperform previous works while providing explanations for their predictions. We analyze the attention maps of one of our direct pain detection models, finding reasonable interpretations for its predictions. We find the models to perform much worse on pain intensity estimation, showing the limits of the simple approach chosen. We also evaluate Mixup, an augmentation technique, and Sharpness-Aware Minimization, an optimizer, with no success. Our presented models for direct pain detection, ViT-1-D (F1 score 0.55 ± 0.15), ViViT-1-D (F1 score 0.55 ± 0.13), and ViViT-2-D (F1 score 0.49 ± 0.04), all outperform earlier works, showing the potential of vision transformers for pain detection.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	Automatic detection of facial indicators of pain has many useful applications in the healthcare domain. Vision transformers are a top-performing architecture in computer vision, with little research on their use for pain assessment. In this thesis, we propose the first fully-attentive automated pain assessment pipeline that achieves state-of-the-art performance on direct and indirect pain detection from facial expressions.
dc.title	Interpretable and explainable vision and video vision transformers for pain detection
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.keywords	Transformer; Explainable; Interpretable; Pain; Facial expression; Computer Vision; Medicine
dc.subject.courseuu	Artificial Intelligence
dc.thesis.id	12458

Files in this item

Name:: Master Thesis Giacomo Fiorenti ...
Size:: 3.776Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record