A Deep Learning Approach to Interest Analysis
Meer, T. van der
MetadataShow full item record
The analysis of interests from young adolescents in the form of short, colloquial Dutch text is a challenging task for pre-trained neural networks. By quantitative and qualitative tests, four pre-trained language models on the Dutch language are compared and contrasted. Three more language model fine-tuned models are added to test transfer learning capabilities for the qualitative tests. By training a classifier on a named entity recognition- and sentiment analysis task, the models are quantitatively compared. For the qualitative comparison, The outputs from the embedding layer are used to gain insight in relation classification and clustering. A test for ranking interest pair similarities has been developed in order to investigate the semantical understanding of the Dutch language in the models. Furthermore, the clustering capabilities of related interests are examined. Finally, given relation structures in sports, instruments and school courses are brought to a test. BERTje outperforms the other models in the quantitative tasks. However, BERTje performs the worst on the triplets ranking test. RobBERT fine-tuned and FastText show the best results on the triplets analysis. All models lack to show semantical understanding in the clustering analysis. FastText shows the most semantical understanding in the relation structures, though still relatively poor. The outputs from the embedding layer shows that the models do not have a semantical understanding of the Dutch language but fall back on morphological structures. Therefore, these techniques are not ready to be used for interest analysis. Creating a downstream task, data enrichment and knowledge infusion are candidates for improvements on interest analysis.