Towards Interpretable Multimodal Models for Emotion Recognition

Boer, Kathleen de

View/Open

Thesis_K.K._de_Boer.pdf (2.994Mb)

Publication date

2024

Author

Boer, Kathleen de

Metadata

Show full item record

Summary

The contents of this thesis focus on the development and evaluation of an interpretable multimodal model for emotion recognition in collaboration with the Dutch Institute of Sound & Vision. The state-of-the-art multimodal model Self Supervised Embedding Feature Transformer (SSE-FT) was finetuned and assessed on the Multimodal EmotionLines Dataset (MELD), revealing performance issues. The interpretability framework MM-SHAP was modified for emotion recognition and extended to include the text, audio, and video modalities. The proposed interpretability framework and ablation studies showed the SSE-FT predominately relied on the textual modality, leading to uni-modal collapse. The Dutch language model RobBERT was integrated into SSE-FT to increase performance, yet training RobBERT independently showed its limitations in capturing nuanced emotional cues from the MELD dataset. This thesis introduces visualization techniques specifically developed to focus on increasing interpretability within individual modalities, and to assist comparative analysis between the audio and text modality. The proposed interpretability method and visualization technique for text is applied to analyze the textual modality and show valuable insights into the model’s learned emotional cues for the textual modality. The results show that SSEFT trained on MELD relies heavily on paralinguistic cues in text and is not able to capture the more nuanced emotional cues in the video and audio modality. The findings of this thesis call attention to the need for a balanced, high-quality Dutch dataset for emotion recognition as well as the importance of general dataset quality for advancing in the field. The proposed interpretability method is found to be effective for creating interpretability in multimodal models for emotion recognition.

URI

https://studenttheses.uu.nl/handle/20.500.12932/46902

Collections

Theses