Machine Learning for Sentiment Analysis of Children's Diaries
Summary
In this thesis, the automatic detection of sentiments expressed by children using machine learning is investigated. The automatic sentiment detection is used within a robotic companion that helps children with their daily struggles of living with Type 1 diabetes. The problem statement that guides this research is: To what extent is it possible to correctly classify the sentiment of a Dutch child’s diary entry by means of automatic text analysis? Results show that machine learning models yield a significantly higher performance than a symbolic sentiment scoring algorithm. Machine learning models have shown to be better at capturing context and at capturing complex negations. The usage of an ordinal or time-correlated adaptation of a machine learning model is evaluated as well, but results show that such a model does not have an advantage over a regular machine learning model. Additionally, a new algorithm for semantic normalization on top of standard morphological normalization is introduced in this thesis. Results show that using this newly created step consistently improves the performance in the current study. The new algorithm for semantic normalization is especially useful in this thesis, because the dataset is sparse and contains highly infrequent words.