Text-based empathy detection on social media
Summary
The advancement of text-based empathy detection would be beneficial for the progress of affective computing. Affective computing is concerned with creating systems with emotional understanding and empathy is a key aspect of emotional intelligence. Also, empathy detection as a tool has many applications. However, only recently researchers started to focus on this topic, with most studies focusing on counseling data or on social media centered around psychological support. This study, first, takes a computational approach on the ``Reactions to news stories" dataset, created by Buechel et al. (2018), with the usage of Transformer models. The pre-trained Transformer models of BERT and RoBERta were fine-tuned on the data after a thorough hyper-parameter selection phase. In addition, the thesis explored data augmentation methods, but they did not improve performance on this task. During the model creation phase, the Transformer models improved approximately 10\% on top of the baselines (CNN, FNN, Ridge regression), without using data augmentation methods. I conclude that Transformers are capable of predicting the EC and PD scores, even though the data had increased difficulty due to the sample size and because the scores were self-evaluated by the commenters. Additionally, this thesis investigates the differences between Reddit and Twitter on empathetic concern (EC) and personal distress (PD), using the selected BERT balanced model. The selected model was applied to user comments from Reddit and Twitter on the same news articles. This data were gathered during this dissertation. The results showcase significantly higher scores of EC and PD on tweets compared to Reddit comments on the same news articles. Further researcher should be made to investigate the reasons that lead to users having different behavior.