Comparison of Acoustic Feature Representation Methods for Apparent Personality Recognition
Summary
This thesis examines the performance of Fisher Vector representations in classifying personality traits from audio. The Chalearn LAP First Impression dataset is
used, which is a multimodal dataset. The audio modality of the dataset is focused on,
and different audio feature extraction methods, including wav2vec 2.0, openSMILE,
and public dimensional emotion model (PDEM), are studied for their performance on
the classification task. Different encoding approaches, such as Fisher Vector, are also
studied to see how they affect the performance of the classifier. The results of this
thesis suggest that Fisher Vector representations are not the best choice for classifying
personality traits from audio for the certain dataset. However, other feature extraction
methods, such as openSMILE LLDs and PDEM, can achieve good performance on this
task. The thesis also provides some insights into the selection of parameters for feature
engineering and the interpretability of Fisher Vector representations.