Emotional Deep Learning; A Cross-lingual Speech Emotion Recognition Thesis on English and Dutch
Summary
Research on speech emotion recognition encounters the problem that the availability of well-annotated corpora is scarce for many languages. In this research, a cross-lingual deep learning approach is presented, in which a deep learning model is trained on English corpora annotated with seven different emotional labels and tested on a Dutch corpus annotated similarly, originating from a Dutch oral history archive. A one-dimensional multilayered convolutional neural network architecture is used and tested during a mono-lingual speech emotion recognition experiment using the English corpora. Results show that the architecture used is capable of approximating the state-of-the-art performance for mono-lingual speech emotion recognition, retrieving an average accuracy of 0.585 during 5-fold cross-validation. Results on the cross-lingual experiment show that cross-lingual speech emotion recognition is feasible across English and Dutch by retrieving a well above chance accuracy of 0.311 on the Dutch corpus. These results enable future work to further explore speech emotion recognition for Dutch by validating and enlarging the Dutch corpus and to implement techniques reported to significantly improve performance on the cross-lingual speech emotion recognition task from English to Dutch.